THE ADVANCED 
THEORY OF STATISTICS 


See also Vol. Il of this book— 
THE ADVANCED THEORY OF  . 
STATISTICS VOL. Il 


By M. G. Kendall, M.A. 


likelihood — Estimation : miscellaneous methods—Con- 
nference—Some comm 


SECOND EDITION. Crown Quarto. Pp. ix +521. 30 illustrations 
and 52 tables, 


Price 50s. net. 


Other books of interest— 


AN INTRODUCTION TO THE THEORY 
OF STATISTICS 


By G. Udny Yule, C.B.E., M.A., F.R.S., and 


M. G. Kendall, M.A 


ng of variables : 
Small samples— 
xercises—Index. 


Pp. xiii + 570. With 
Price 24s. net. 


BIOMATHEMATICS 


Principles of mathematics for students of biological science. 
By. W. M. Feldman, M.D., B.S.(Lond.), F.R.S.(Edin.), F.R.C.s. 


SECOND EDITION, enlarged and re-set, Crown 8vo. Рр. xviii -+ 480. 
ith many worked examples, and 164 Diagrams. Price 28s. net. 


LTD. 


THE ADVANCED 
THEORY OF STATISTICS 


MAURICE G. KENDALL, M.A. 


Assistant GenerabManager and Statistician-to the Chamber 
. of Shipping of the United Kingdom. 


VOLUME | 


With 16 Illustrations and 79 Tables 


FOURTH EDITION 


LONDON 
CHARLES GRIFFIN & COMPANY LIMITED 
42 DRURY LANE 
1948 


[All Rights Reserved] 


reau Edni. Psv. UN 
ET ! NG COLLEGI 
DAVID HAR КИП Ex] 


fs | 
au D 
x | 
f | 
ҮТ 
\ J 
bw t 


Printed in Great Britain 
by Buller & Tanner Limited, Frome 


a> 7а 


_ DEDICATED 
TO MY MOTHER. 


og at the roadside,” says I, “ and 
and ribaldry of the poets. Tt is in 
ascertained facts and legalised 
e found. In this very log 


“Let us sit on this 1 
forget the inhumanity 
the glorious columns of 
measures that beauty is to b 
we sit upon, Mrs. Sampson," says I, *is statistics more 
wonderful than amy poem. The rings show it was sixty 
years old. At the depth of two thousand feet it would 
become coal in three thousand years. The deepest coal 
mine in the world is at Killingworth, near Newcastle. 
A box four feet long, three feet wide, and two feet eight 
inches deep will hold one ton of coal. If an artery is cut, 
compress it above the wound. А man’s leg contains thirty 
bones. The Tower of London was burned in 1841.” 

« Go on, Mr. Pratt,” says Mrs. Sampson. “Them ideas 
is so original and soothing. I think statistics are just as 


lovely as they can be.” 
О. Henry, The Handbook of Hymen. 


descriptive properties of frequency- 


PREFACE 


The need for a thorough exposition of the theory of statistics has been repeatedly 
emphasised in recent years, The object of this book is to develop a systematic treatment 
of that theory as it exists at the present time. Originally my intention was to complete 
the work in one volume, but the war has made such a course impossible. Nevertheless, 
this first volume is largely complete in itself and can, I hope, be profitably read in advance 
of the publication of its successor. 

In 1938 Dr. M. S. Bartlett, Dr. J. O. Irwin, Professor E. S. Pearson, Dr. John Wishart 
and I discussed the possibility of writing a treatise on the theory of statistics in co-operation, 
and even got as far as sketching a synopsis. This proposal, however, had to be abandoned 
after the outbreak of war, and with some misgivings I decided to proceed alone. My present 
treatment differs very considerably from the one then agreed upon, since a number of 
Sacrifices of viewpoint made for the purpose of reaching unanimity are no longer necessary. 
I must accordingly assume sole responsibility for the form and content of the present book, 
but acknowledgment is due to my colleagues for the helpful discussions which took place 
while the synopsis of the original proposal was being drafted. Д ^ 

Apart from the usual problems arising in writing any book with pretensions to com- 
Prehensiveness—emphasis, rejection of unimportant material, sequence of presentation, 
and so forth—there were two main questions to be decided in regard to this book 3 the 
amount of mathematics admitted, and the point of introduction of the theory of probability. 


Statistical theory is essentially mathematical, and I have not hesitated—in fact I have 
ееп compelled—to adopt a rather advanced mathematical treatment in order to achieve 
rigour where it is attainable in the present state of our knowledge. Nevertheless I have 
tried (in places, perhaps, “with indifferent success) to keep the mathematics to heel. 
This is intended to be a book on statistics, not on statistical mathematics. : 
As to the place of the theory of probability, I have felt it preferable to deal with the 
distributions before introducing the probability concept. 


is is justified both by the historical development of the subject and by the necessities 
ОЁ а logical presentation. Some readers may feel that the whole theory of modern statistics 
15 во permeated with the sampling conception that an earlier introduction of probability 
Would more than offset the loss in logical sequence by the gain in didactic force, This 
view T myself hold to be fundamentally wrong, but if the reader feels keenly on the subject 
© has, after all, merely to read Chapters 7 and 8 immediately after Chapter 1 and the 
culty is to a great extent resolved. | 
The subjects covered by the present volume may be considered under three main 
heads, Chapters 1 to 6 deal with Frequency-Distributions and their properties. Chapters 
nico deal with the Theories of Probability and Sampling and with the Sampling Dis- 
tribssiong to which they lead. Broadly, this section comprises the theory of those distri- 
butions which are derived from parent populations for special purposes such as inferences 
m Probability, and may be termed the Theory of Derived Distributions. Chapters 13 to 16 
a measure of relationship, the general 


еа] with the Theory of Correlation, considered as 

heory of regression dew being left to the second volume. Chapter 12, on the y?-dis- 

tribution, is perhaps something of an intrusion in the development, but in view of the 
_ vii A 


vi PREFACE 


widespread applications of 7? in testing agreement between theory and observation I felt 
that it should be introduced at an early stage 

The second volume will deal with the Theory of Estimation, Regression, Analysis 
of Variance, Tests of Significance, Multivariate Analysis, Theories of Statistical Inference, 
and Time Series. In the first volume it has been possible to avoid a detailed examination 
of controversial topics connected with the logic of inference in probability; the subject 
wil be taken up more systematically in the second volume. 

On the invaluable principle that example is better than precept, a special effort has 
been made to exemplify the theory at every Stage and to provide exercises for the reader 
to work out for himself. Some of the latter are rather difficult, but have nevertheless been 
included to illustrate the scope of application of the theory and to refer to results for which 
no place could conveniently be found in the text. In assembling this material I have 
drawn freely on the wealth of research work in statistical periodicals, particularly Biometrika, 
and am glad to make acknowledgment to the authors from whose papers examples have 
been taken. 

Foremost among my more specific indebtedness is that to Dr. Leon Isserlis, who 
read the whole book at the galley proof stage and to whose careful scrutiny I owe a great 
deal. Ihave also to thank Dr. J. O. Irwin, who allowed me to consult his draft of a chapter 
originally intended for the co-operative treatise (this forms the basis of Chapter 10); 
Professor R. A. Fisher and Messrs. Oliver and Boyd, for permission to reproduce Appendix 
Tables 4 and 5 from the former’s Statistical Methods for Research Workers; and the pub- 
lishers, Messrs. Charles Griffin and Co., and the printers, Messrs. Butler & Tanner Ltd., 
who have taken great pains with some very difficult manuscript. 

I shall be grateful to any reader who notifies me of an 


y error, omission or ambiguity, 
from which, I fear, no book of this kind can be entirely fr 


ee at its first appearance. 


M. G. KENDALL. 
Тохрох, 


February Ist, 1943. 


NOTE TO THE FOURTH EDITION 


This edition is substantially the same as its predecessors, but a number of misprints 
has been removed and some references added to work published since the issue of the 
first edition. A few further examples have been added. I am indebted to several 


correspondents for calling my attention to misprints and points. where the presentation 
was ambiguous. 


M. G. K. 
May, 1948. чш 


бу 


\ 29 2 ы 


CHAP, 


TABLE OF CONTENTS 


Introductory Note 


ЗЭЭ er 


Frequency-distributions 


Measures of location and dispersion ... E ses ue I 


Í - Moments and cumulants ... 


4. Characteristic functions 
| 5. Standard distributions—(1) 
6. Standard distributions—(2) 


Ы 7. Probability and likelihood 
8. ... 


Random sampling 


| 9. Standard errors 


10. Exact sampling distributions 


11. Approximations to sampling distributions. 


12. The 7:.distribution 


13. Association and contingency 


14. Product-moment correlation 


15. Partial and multiple correlation 


16. Rank correlation -t 


APPENDIX TABLES :— 


t 


Frequency function of the normal ales Song! 238 
Distribution function of the normal distribution 
Distribution function of the ¢-distribution 

5 per cent. points of the z-distribution ... i 
l per cent. points of the z-distribution ; 


Distribution function of y? for one degree of freedom, g? = 0 to 1 


\ у? = 10 
Distribution function of 7? for one degree of freedom, у 1 to 


A : y* faces. 
APPENDIX Dracram: Contour lines of the 7°, > surla 


стт? 
1хрих то Votump I... 00e 


РАСЕЗ 
xi-xii 
1-28 
29-48 
49-89 
90-115 
116-136 
137-163 
164-185 
186-203 
204—230 
231-253 
254-989 
290-307 
308-323 
324—367 
368-387 
388-437 


438 
439 
440—441 
443 
443 
444 
445 


446 


447-457 


i INTRODUCTORY NOTE 


0.1. "The chapter-sections in this book are numbered seriall 

- а y. Th ia 

e с oe by the number of the chapter in which they occur and ae ke. 

"s cu 4 eg. 14.13 refers to the thirteenth section of Chapter 14. A similar procedure 

sh [e ч or tables, equations and exercises, e.g. (7.15) refers to the fifteenth equation 
napter 7. In cross-references, chapter-sections are denoted by clarendon type, the others 


by ordinary type. 


0.2. References to printed work are given by author’s name and date of publication. 


due e * bis at the end of the chapter authors are arranged alphabetically. 
a ce roi from publications are referred to, the number of the volume is given in 
Math. St Es pe and ihe number of the first page of the article in ordinary type, e.g. Ann. 
of cus a list. 10, 275, refers to the article beginning on page 275 of volume 10 of the Annals 
diete e de Statistics. Where an exercise is followed by an author’s name and a date, 
A gu t given in the exercise appears in the article listed in the references to the chapter 

neerned under these particulars. Where the result is from an artiele not previously 


referred to a full reference is given. 


03: i ion i i 
iec. The mathematical notation is that in current use, but a few symbols may be 
(1) The exclamation mark ! written after an integer means the factorial of that integer. 
ded use for non-integral numbers by vois 


Some writers give the symbol a more exten 
а = I(v + 1) =| et t dt. 


0 
accords with the factorial notation, but will not be used in this book. 


К К т n! : : 
(2) The combinatorial sign (|) = eel will be used in place of the older "C,. 


This, of course, 


jan 


| (3) The summation sign will be written as 2, e.g. 255 =H, + ta He o e + 2$ 


n 


J and in many cases to у ог merely to 


j 


7-21 


j=n 
The symbol ki can as a rule be shortened to 


j=l 
ar from the context. 


F tl j=1 3 < 
› the extent of the summation being: cle 
(given above), the B-function, and the 


(4) The ordinary notation for the T-function 


Ypergeometric function will be used, i.e- 
I(p) Iq) 


1 
= ~p—1 _ x) dg = SF 
Bp, a) = [0 790^ = fj 


and 
le + UPRED ua Set 1)(® + 2(8 + VB + 2) ys 4 
1.2.3. yy + 1)(у + 2) E 


F(a, B. a a. p 
Bay x) =1 + — a 
ui heey Lg en 


nential function will be written as a power 


(5) Where the exponent is concise, the expo: 
we shall use the notation exemplified by 


ОЁ e, for example ei". But where it is lengthy 


exp (— 3(y? — 2pxy + y*)} instead of eie- Semen, 
n xi 


E INTRODUCTORY NOTE 


0.4. Insome fields it is useful to preserve a distinction between a statistical parameter 
in a population and the estimate of that parameter from a sample. Whore possible, the 
former will be denoted by a Greek letter and the latter by a Roman letter, e.g. the product- 
moment correlation coefficient of a population is denoted by p and that of a sample by r. 
It is not, however, always possible to preserve this distinction, as for instance with the 
multiple correlation coefficient E, in which case a Greek capital would be confused with the 
Roman P. Complete notational consistence can only be achieved at the expense of 
jettisoning a great deal of accepted statistical usage, and even then would probably 
result in some cumbrous symbols. 


0.5. In order to enable the reader to follow the worked examples and illustrative 
material, a few tables of functions commonly required are given at the end of this volume. 
"These tables are in no way a substitute for the comprehensive sets which have been pub- 
lished and which are a necessary adjunct to most practical and a good deal of theoretical 
work. Frequent reference will be made to the following :— 

T'ables for Statisticians and. Biometricians, edited by Karl Pearson, Parts I and II, Biometrika 

Office, University College, London, W.C.1. 

Statistical Tables for use in Biological, Agricultural and Medical Research, by В. A. Fisher’ 

and F. Yates, Oliver and Boyd, Edinburgh. „и 


The following are also useful :— 


Tables of the Incomplete I-function, edited by Karl Pearson, Biometrika Office, University 
College, London, W.C.1. 


Tables of the Incomplete B-function, edited by Karl Pearson, Biometrika Office, University 
College, London, W.C.1. 

The Kelley Statistical Tables, by T. L. Kelley, Macmillan, London and New York. 9 

“Tables of Pearson’s Type III Function," by L. В. Salvosa, Ann. Math. Statist., 1930, 
1, 191. . А 

Tables of the Higher Mathematical Functions, edited b 
Press, Bloomington, Indiana. 

Tables of Random Sampling Numbers, 
Cambridge University Press. . s 

Tables of Random Sampling Numbers, by M. G. Kendall and B. Babington Smith, Tracts 
for Computers, No. 24, Cambridge University Press. 


Tables of the Correlation Coefficient, by F. N. David, Biometrika Office, University College, 


by L. H. C. Tippett, Tracts for Computers, No. 15, 


London, W.C.1. 


Tables of tan-* ж and log (1 + х?), by L. J. Comrie, Tracts for Computers, No. 23. 


Tables of the Probability Integral by W. F. Sheppard, Briti iati i 
; . Е. Sheppard, British Association Math 
Tables, Vol. 7, Cambridge University Press. Еш 


0.6. The references given at the end of the cha: E: i i 
| 1apters are mainly intended to guide 
further reading and are not exhaustive. A more ex i ibli : 
р xti i 
the end of volume II of this book. aeui y e ee ee 


| 


R 


y H. T. Davis, Parts I and II, Principia i 


р: 


CHAPTER 1 
FREQUENCY-DISTRIBUTIONS 


Statistics as the Science of Populations z 

1.1. Among the many subjects about which statisticians disagree is the definition 
of their science. In the Revue de l'Institut International de Statistique for 1935 (vol. 3 
page 388) Dr. W. F. Willcox listed well over a hundred definitions of statistics, and the 
list was far from exhaustive. Even when we exclude those definitions which were formulated 
before the subject reached its present extent we are left with a variety of choices, and 
there is no definitive description of the scope of the science of statistics with velia we 


can begin this book. 


al notion in statistical theory is that of the group or aggregate, 
special word—* population ". This term will be 
llection of objects under consideration, whether 
hall consider populations of men, of plants, of 
heights on different days, and even populations 
of ideas, such as that of the possible ways in which a hand of cards might be dealt. The 


notion common to all these things is that of aggregation. 
at statistics is mainly concerned. In con- 


It is with the properties of populations tha 
sidering a population of men we are not interested, statistically speaking, in whether some 
particular individual has brown eyes or is а forger, but rather'in how many of the individuals 


have brown eyes or are forgers, and whether the possession of brown eyes goes with a 
propensity to forgery in the population. We are, so to speak, concerned with the properties 
of the population itself. Such a standpoint can occur in physics as well as in demographic 
sciences. For instance, in discussing the behaviour of a gas we are not so much interested 
in the behaviour of particular molecules, as in that of the aggregate of molecules which 
go to compose the gas. The statistician, like Nature, is mainly concerned with the species 


and is careless of the individual. 


1.2. The fundament: 
a concept for which statisticians use a 
generally employed to denote any co 
animate or inanimate ; for example, we s 
mistakes in reading a scale, of barometric 


begin an approach to a definition of our subject by the 
neh of scientific method which deals with the properties 
is rather too general. Statistics deals only with the 
numerical properties. · А. dictionary, for example, sets out a population of words, and 
among the properties of that population which аге а suitable subject for scientific inquiry 
is that of word-derivation. It is not of statistical concern, however, to know that some 
words are derived from Latin, some from Anglo-Saxon and some from Hindustani. The 
subject would only assume а statistical aspect if we were to inquire how many words were 


derived from the different sources. 


1.3. We may therefore 
following: statistics is the bra 
of populations. This, however, 


«та. As a second approximation to our definition we may then try the following : 
statistics is the branch of scientific method which deals with the data obtained by counting 


or measuring the i B lations 
properties о populations. 3 
A set of logarithm tables is à population of numerals, 


b з again is a little too general. 
ut it is hardly a subject for statistical inquiry, for every numeral is determined according 
B 


AS. — YOL. T. 


2 : FREQUENCY-DISTRIBUTIONS 


to mathematical laws. The statistician is rather concerned with populations which occur 
in Nature and are thus subject to the multitudinous influences at work in the world at 
large. His populations rarely, if ever, conform exactly to simple mathematical rules, 
and in fact it is in the departure from such rules that he often finds topics of the greatest 
statistical interest. To allow for this factor we may then formulate our definition as 
follows :— 

Statistics is the branch of scientific method which deals with the data obtained by 
counting or measuring the properties of populations of natural phenomena. In this 
definition “ natural phenomena ” includes all the happenings of the external world, whether 
human or not. 

This is as far as we need pursue the matter. 
look through the definitions listed by Dr. Wilco 
find, I think, that in the light of this definition th 
running through them. 


The reader who is interested enough to 
X in the article referred to above will 
ere is a perceptible thread of continuity 


1.5. For the avoidance of misunderstandings in the interpretation of this definition 
it may be as well to point out that “ statistics," the name of the scientific method, is 
a collective noun and takes the singular. The same word “ statistics " is also applied to 
the numerical material with which the method operates, and in such a case takes the 
plural. Later in this book we shall meet the singular form "statistic," which is not, 
as might be supposed, an individual item of information which in the aggregate would 


compose *' statistics," but is the name given to an estimate of certain unknown measures 
of a. population. 


Prequency-Distributions 


1.6. Consider a population of members each of which be 
of a variable, e.g. of men measured 


to numbers of petals. This variab 


ars some numerical value 
according to height or of flowers classified according 
le we shall call a variate. If it can assume only 
a number of isolated values it will be called discontinuous, and if it can assume any value 
of a continuous range, continuous. The population of members will then correspond to 


а population of variate-values, and it is the properties of this latter population which 
we have to consider, - 


If the population consists of only 
consider the population of variate 
the ageregate is large ( 
to be reduced in some 
by classification of the 
should be equal, so + 
interval is called the 


a few members we can without much difficulty 
-values exhibited by them; but if, as usually happens, 
or, in a sense defined later, infinite), the set of variate-values has 
way before the mind can grasp their significance. This is done 
individuals into ranges of the variate. So far as possible the ranges 
hat the numbers falling into different ranges are comparable. The 
3 class-interval (or simply the interval) and the number of members 
bearing a variate-value falling into a given class-interval is the class-frequency (or simply 
the frequency). "The manner in which the class-frequencies are distributed over the 
class-intervals is called the frequency-distribution (or simply the distribution). 

1.7. Tables 1.1 and 1.2 give some frequency-distributions of observed populations 
classified according to a Single variate. Table 1.1 shows the 1567 Local Government 
· Areas of England and Wales distributed according to the variate “ birth-rate.” Неге, 
for example, there were 7 dist 


; c ; ricts with a birth-rate of between 5-5 and 6-5 per thousand, 
and 271 with a birth-rate between 13-5 and 14:5 per thousand. The general nature of 


FREQUENCY-DISTRIBUTIONS Cc 


TABLE 1.1 
Ü . р 
д Showing the Number of Local Government Areas in England with Specified Birth-rates per 
Thousand of Population. ; 
_ (Material from the Registrar-General's Statistical Review of England and Wales for 1933.) 
Number of Number of | 
4 Districts with А istri i 
Binh Districts wit 
irth-rate, Birth-rate in Birth-rate. Bathe 
| Specified Range. s Specified Range. 
| | | 
| 1-5 and not exceeding 2-5 1 13:5 and not exceeding 14-5 271 
| 28 , WE OS 2 145 , » 155 190 
3-5 б » 4-5 2 155 , "s 16:5 127 
$5 „ m 55 3 16:5 А 5 17-5 89 
Be oe x 6-5 7 lT5. GH i 18-5 78 
65 1 " 7:5 9 ICE 2 19-5 37 
UND ж » 8-5 14 19:5. =. » 205 21 
| 85 7 ji 9-5 41 205 „ » “21-5 17 
ep 95 2 » 105 83 915 ^, s DRS 4 
+ 105 „ " 11:5 131 225 „ 5 23-5 4 
105 „ » 125 192 29:54 "s » * 24-5 | 2 
1285 , » 135 242 | 
TOTAL 1567 
L | 
TABLE 1.2 


Showing the Numbers of Persons in the United Kingdom liable to Sur-tax and Super-tax in 
the Year beginning 5th April 1931, classified according to the Magnitude of their 


Annual Income. 
(From the Statistical Abstract for the United Kingdom for the Years 1913 and 1919-32, 


Cmd. 4489.) 
Annual Income Number of belium E. 
(£000). Persons. £500 Interval. 

R 2 and not exceeding 2-5 23,988 23,988 
\ 25 , » 3 15,781 15,781 
x Sr Ws Б 4 17,979 8,989 
8-5: не Wes 9,755 4,877 
Ка, » 6 5,021 9.960 
Oe x 7 3,729 1,864 
7' 4 Y 8 2,546 1,273 
& wh Ж 3,193 798 
10 » 55 15 3,616 362 
o AE 1,328 133 
20. 5 ЧУБ 679 68 
ЗР s 30 378 38 
30 Ж, f. 5 372 19 

"= 40 „ Е. 50 192 10 l 
B0. 5 Se 7 182 4 
To ois » 100 s 57 1 

100 and over | 94 d ? ! 

| Total number of persons 89,790 — | 


4 FREQUENCY-DISTRIBUTIONS 


! nil 
^ "n 


8 


8 


50 


Frequency of Local Government. Areas 
а 
о 


Birth-rate (per thousand of Bohüfatton) 

Fro. Ll. Frequency Polygon of the Data of Table 14. 
the distribution is shown in this table in a way which would be quite impossible if each 
of the 1567 districts were shown separately. The greatest number of districts fall within 
the range 13:5-14-5 per thousand and the frequencies tail off on either side of this у l 
Table 1.2 shows the number of persons subject to sur-tax and super-tax in the United 
Kingdom in 1931 classified according to the variate “ income.” The class-intervals here 

are unequal—a typical defect of official figures—and in the last column 

E of the table is a reduction of the class-frequencies to comparability, namely, 

to frequency per £500 within the class-interval concerned. Looking at this 

column we see that the maximum frequency per £500 in this case is at the 
beginning of the frequency-distribution, 


alue. 


20 


1.8. The frequency-distribution ma 
[| Measuring the variate-value along the a-axis and frequency per class- 

interval along the y-axis, we erect at the abscissa corresponding to the 
centre of each class-interval an ordinate equal to the frequency per unit 
interval in that interval. The ends of these ordinates are ioined by 
Straight lines, one to the next. The diagram so obtained is called a 


Мо 1.1. ws the frequency polygon for the data 
of Table 1.1, 


As a variant of this procedure we ma 


range corresponding to each class-interval 


Y be represented graphically. 


Number of Persons (000's) 


У erect on the abscissa 


а rectangle whose area 
is proportional to the frequency in that interval. A diagram 


constructed in this way is called a Histogram, Fig. 1.2 shows 
such a histogram for the data of Table 1.2. It is evident that 


the histogram is a 

more suitabléiovm 

of representation 

= 3% when the class- 
Intervals are un- 
equal, 


10 15 20 
Annual Income (£000) 


Fio. 1.9. Histogram of the Data of Table 1.2, 


© 


FREQUENCY-DISTRIBUTIONS 5 


1.9. A few practical points in the tabulation of observed frequency-distributions 
may be noted“ 

(1) It has been remarked that wherever possible the class-intervals should be equal. 
The importance of this will be more appreciated in subsequent chapters; but it is already 
evident that comparability is difficult to carry out by inspection when there exist inequalities 
in class-intervals. On running the eye down the second column of Table 1.2, for example, 
we note that the frequencies in intervals 3-4 and 8-10 are greater than in the immediately 
preceding intervals; but this is merely due to a change in the width of the intervals 
at those points and, as is seen from the third column, the frequency per unit interval 
decreases steadily. 

(2) It is important to specify the class-interval with precision. We not infrequently 
meet with such classifications as “ 0-10, 10-20, 20-30,” ete. To which interval is a member 
with variate-value 10 assigned ? Obviously the classification is ambiguous if such values 
can in fact arise. We must either take the intervals “ greater than or equal to 0 and less 
than 10, greater than or equal to 10 and less than 20,” or make it clear what convention 
we use to allot a variate-value falling on the border between two neighbouring intervals, 
e.g. it might be decided to allot one-half of the member to each. . There are various ways 
of indicating the class-interval in practical tables, e.g. “ 10-, 20-, 30-" means “ greater 
than or equal to 10 and less than 20,” and so forth. Sometimes, where a continuous 
variate is concerned, there is an element of imprecision in the specification of the fineness 
to which the measurements are made; for example, if we are measuring lengths in 
centimetres to the nearest centimetre, an interval shown as “ greater than 15 and less than 
18” means an interval of “ greater than 15.5 and less than 17.5." When the precision 
of the measurements is known we can specify an interval by its middle point, for example, 


in this case, 16.5. 


TABLE 1.3 
: : ; 
Showing the Number of Deaths from Scarlet Fever at Different Ages in England and Wales 
in 1933. 
(Data from Registrar-General's Statistical Review of England and Wales for 1933, Tables 
Part I, Medical.) . 
— 
И ber рег M, 2 Number of Number per 
Ago in Years. ыш тшн E Age in Years. | ` Deaths. Year. 
16 40- 10 2-0 
i 69 69 45- 6 12 
3 2 89 50- 4 
i. are 74 55- 5 L0 
4- 74 | 74 902 ET = 
5- 213 42-6 65- 1 0-2 
p 70 14-0 70- 1 ое 
15- 27 5-4 75- 1 0-2 
20- 26 52 s = = 
25- 17 $4 
30- 12 ae 
35— 11 2-2 TOTAL 729 — 


6 FREQUENCY-DISTRIBUTIONS 


(3) Remark (1) about the importance of equality of class-intervals should not be held 
to preclude the specification of frequencies in finer intervals where the frequency is changing 
very rapidly. Table 1.3, for instance, shows the number of deaths from scarlet fever in 
England and Wales in 1933 according to the variate “age at death." If the frequencies 
in the interval “ 0 and less than 5” were not subdivided and were thus shown as a total 
322 for the interval, we might draw the conclusion from the uniformly decreasing number 
of deaths as the variate increases that the greatest number of deaths occurred in the 
first year of life. This is not so, as is shown by the individual frequencies in the first 
five years. 

(4) Perhaps it is hardly necessary to add that the histogram is not a suitable method 
of representing data classified according to discontinuous variates. It shows the class- 
frequency uniformly dispersed over the whole interval, whereas if the variate is discon- 
tinuous, frequencies must necessarily be concentrated at certain points. 


Frequency-Distributions : Discontinuous Variates 


1.10. It will be useful at this stage to give some examples of the frequency-distri- 
butions which occur in practice. 

Table 1.4 shows the distribution of digits in numbers taken from a four-figure telephone 
directory. The numbers were chosen by opening the directory haphazardly and taking 
the last two digits of all the numbers on the page except those in heavy type. The 
distribution is irregular, but from a cursory inspection of the table we are inclined to suppose 
that the digits occur approximately equally frequently in the larger population from 
which these 10,000 members were chosen. We shall see later (p. 193) that the divergences 
from the average frequency per digit, 1000, are not accidental sampling effects ; but 9t this 


stage it is sufficient to note that the data suggest for consideration a population of equally 
frequent members. 


TABLE 1.4 


Showing Nwmber of Different Digits chosen haphazardly from the London Telephone 
Directory. 


(M. G. Kendall and B. Babington Smith (1938), Jour. Roy. Statist. Soc., 101, 147.) 


7 
Digit . 0 б” 
81 | 1 | 2.) 3 4 5 6 | 7 8 9 | Torat. 


972 | 904 | s53 [25 000 


| | 


| | 
Wi | | 
| pe 

Frequency . | 1026 | 1107 | 997 | 966 | 1075 | 933 1107 


Table 1.5 shows the distribution of a number of seed capsules of 


А ; Shirl i 
according to the variate “number of stigmatic rays." EOY Poppies 


The distribution in this case is 


| 


& CU 


| 


DISCONTINUOUS VARIATES 7 


more regular, there being a maximum frequency at 13 and a steady decrease on either 
side, Ы у 


TABLE 1.5 


Showing the Frequencies of Seed Capsules on certain Shirley Poppies with Different Numbers 
of Stigmatic Rays. 


(Cited from б. Udny Yule (1902), Biometrika, 2, 89.) 


4 Number of Capsules Number of Capsules 
Number of with said Number Numbenot with said Number 


Stigmatic Rays. | of Stigmatic Rays. | Stigmatic Rays. | of Stigmatic Rays. 
A 3 14 302 
T il 15 234 
8 38 16 128 
9 106 17 50 
10 152 18 19 
11 238 19 3 
12 305 2 1 
315 = 
13 TOTAL 1905 


In Table 1.6, on the other hand, showing suicides among women in some German 
states in certain years according to the variate “ number of suicides per year," the 
distribution reaches its maximum frequency in the region 1-3 suicides and then tails off 


rather slowly. 


TABLE 1.6 


Showing Suicides of Women in Eight German States in Fourteen years. 


(Von Bortkiewicz, Das Gesetz der kleinen Zahlen, 1898.) 


Number of Suicides 0 1 2 3 41 846 1,7 | 8 | 9 | 10 and over | Toran. 
per year | ! 
| 
Frequency. . - | 6) | 19. 37] 20.) 15-] 3X |. 8 | 2 3 5 3 112 
| | 


Prequency-Distributions : Continuous Variates 

1.11. Table 1.7 shows a number of adult males in the United Kingdom (including, 
at the time of the collection of the data, the whole of Ireland), distributed according to 
the variate “ height in inches.” The frequency polygon is shown in Fig. 1.3, It will be 
seen that the distribution is almost symmetrical, there being a maximum ordinate at 
67- inches. and a steady decrease in frequency on either side of the maximum. 


8 FREQUENCY-DISTRIBUTIONS 


TABLE 1.7 


Showing the Frequency-distributions of Statures for Adult Males born in the United K ingdom 
(including the whole of Ireland). 


(Final Report of the Anthropometric Committee to the British Association, 1883, p. 256.) 


As Measurements are stated to have been taken to the nearest {th of an inch, the class-intervals are here 
presumably 564-574, 5748-58}, and so on. 


| 
Height without opm usn ы Height without N imber of Men 
Shoes (inches) within said Limits Shoes tincho within said Limits 
s © | of Height. 2988 (inches), of Height. 
57- 2 69- 1063 
58- 4 70— 646 
59%- 14 71- 392 
60- 41 72- 202 
6l- 83 73- 79 
62- 169 T4- 32 
63- 394 75- 16 
64— 669 76- 5 
65- 990 77- 2 
66- 1223 
67- 1329 
68— 1230 Toran 8585 
mo|—- | КЫЕН H 
| 


HH 
HH 
ae 
zt 
gam 


Frequency (number of men) 
5 


Ер 
all 
T 

H 
E 


58 60 62 64 66 68 70 72 72 76 
Stature in inches - 


Frequency-distribution of the Data of Table 1.7. Values of the abscissa correspond 
to the beginning of class intervals. 


78 80 
Fig. 1.3. 


This more-or-less uniform “ tailing off ” 
distributions, but the symmetrical property 


symmetrical, but Tables 1.8 and 1.9, showing respectively a number of Australian marria c 
distributed according to bridegroom's age, and a number of d s 


А à B airy farms distri 
according to costs of production of milk, illustrate that various degrees of as d 
can occur. An extreme form is shown in Table 1.3. о 


e. . 
served 
Mais roughly 


of frequencies is very common in ob: 
1s comparatively rare. Table 1 


Е 


CONTINUOUS VARIATES 9 


TABLE 1.8 


Showing Numbers of Marriages contracted in Australia, 1907-14, arranged-according to ihe 
Age of Bridegroom in 3-Year Groups. 
(From 8. J. Pretorius (1930), Biometrika, 22, 210.) 


Age of Bridegroom т Age of Bridegroom xa 
(Central Value of 3-year uer pt (Central Value of 3-year Rae af 
Range, in years). tet Range, in years). "OBITIBESS: 

16-5 294 55-5 1,655 

19-5 10,995 58-5 1,100 

22.5 61,001 61:5 810 

25-5 73,054 64-5 649 

28-5 56,501 67-5 487 

31:5 33,478 70-5 326 

34-5 20,569 73:5 211 

37-5 14,281 76:5 119 

40-5 9,320 79-5 73 

43:5 6,236 82-5 27 

46-5 4,770 85:5 14 

49-5 3,620 88:5 5 

52:5 2,190 | 
Toran | 301,785 


TABLE 1.9 


Showing Numbers of Dairy Farms in England and Wales according to Cost of Production 
of Mitk in 1935-6. 
(Data from Costs of Milk Production in England and Wales, Interim Report No. 2, 
Agricultural Economics Research Institute, Oxford.) 


Й i Cost of Production 
Cost, of Production No. of Farms. бете om gallon), No. of Farms, 


(pence per gallon). 


4 10- 65 

3 9 i 40 
6- 34 12- 15 
T- 77 13- 4 
8- É 94 14- 5 
9- 88 15- 2 
Toran 437 


In this connection Table 1.10, showing a number of men distributed according to weight, 
is of interest for comparison with the height data of Table 1.7. The latter is symmetrical 


Y but the former is not. 


y. е 


10 FREQUENCY-DISTRIBUTIONS 


TABLE 1.10 


Frequency-distribution of Weights for Adult Males born in the United Kingdom. 


(Loc. cit., Table 1.7. Weights were taken to the nearest pound, consequently the true 
class-intervals are 89-5-99-5, 99-5-109-5, etc.) 


" | 
Weight in Ibs. Frequency. Weight in lbs. 


Frequency. ; 

90- 2 190- 263 
100- 34 200- 107 
110- 152 210- 85 
120- 390 220- 41 
130- 867 230- 16 
140- 1623 240- 1l 
150- 1559 250- 8 
160- 1326 260- 1 
170- 787 270- = 
180- 476 280- jl 
Toran 7749 


1.12. When the asymmetry of a distribution such as that of Table 1.3 becomes 
extreme we may be unable to determine whether, near the maximum ordinate, there 
is a fall on either side, or whether the maximum occurs right at the start of the distribution. 
This would have been the case in Table 1.3 if we had not the finer grouping for the first 
five years of life ; and it is the case in Table 1.2, in which the maximum frequency apparently 


occurs at or very close to an income of £2,000 per annum. Asymmetrical distributions are 
sometimes called “ skew” ; and those such as Table 1.2 are called “ J-shaped.” 


1.13. In rare cases the distribution may have maxima at both ends, as in Table 1.11, 


TABLE 1.11 


Showing the Frequencies of Estimated Intensities of Cloudiness at Greenwich during the 
Years 1890-1904 (excluding 1901) for the Month of July. 


(Data from Gertrude E. Pearse (1928), Biometrika, 20A, 336.) 


Degrees of Degrees of 
Cloudiness. Frequency. Ойша. Frequency. 
10 676 4 45 
9 148 3 as = 
8 90 2 E 
1 65 1 129 
P 28 0 320 
5 45 
| Toran 1715 


“a 


CONTINUOUS VARIATES п 


showing a number of days distributed according to degree of cloudiness. This із known 
as a U-shaped distribution. 


1.14. Distributions also occur which in general appearance resemble sections of the 
types already mentioned. A J-shaped distribution, for example, resembles the “ tail ” 
of the symmetrical distribution of Table 1.7. Тһе suicide data of Table 1.6 may be regarded 
as a symmetrical distribution truncated just below the maximum ordinate by the impossi- 
bility of the occurrence of negative values of the variate. This sort of conception is 
sometimes useful in fitting curves to observed data—a given analytical curve may fit the 
data quite well in a certain variate range, but may also extend into regions where the 
data cannot, so to speak, follow it. 


1.15. The distributions considered up to this point have one thing in common— 
they have only one maximum or, in the case of the U-shaped curve, only one minimum. 
Distributions also occur showing several maxima, Tables 1.12 and 1.13 being instances in 
point. Тһе first, showing a number of deaths according to age at death, is typical of 
death distributions. Near the staft of the distribution there is a maximum and a rapid 
fall in the frequency ; there is an indication of another maximum about the age 20-25 ; and 
à pronounced maximum about the age 70-75, the frequencies beyond that point tailing 
off to zero. It is natural to wonder whether such a distribution can be usefully considered 
as three superposed distributions, a J-shaped distribution indicative of infantile mortality, 
à more or less symmetrical single-humped distribution with a maximum at 20-25, indicative 
of deaths at the adventurous age, and a skew distribution with a maximum at 70-75, the 


ordinary death curve of senescence. 


TABLE 1.12 


Showing the Number of Male Deaths in England and Wales for 1930-32, classified by Ages 
at Death. 


(Data from Registrar-General’s Statistical Review of England and Wales, 1933, text.) 


P ab = ар Number of Deaths. Ба Number of Deaths. 
years). 
7,2 55- 56,639 
үт enn 60— 68,103 
a 17808 65 80.690 
- 7,305 5- 80, 
TA 13062 70- 84,041 
20- 16,741 75- D 
25- 16,126 80- „09 
p 30- 15,073 85- 19,913 
35- 18,345 90- es 
3с 23,778 95- 767 
е 33,158 100 and over 48 
5 43,812 
Be Toran 729,442 


12 FREQUENCY. -DISTRIBUTIONS 


TABLE 1.13 


Showing Number of Trypanosomes from Glossina morsitans classified according to Length 
in Microns. 


(From K. Pearson (1914-15), Biomeirika, 10, 112. Length presumably to nearest micron.) 


g Length EOR 

ik direqueneys (ауе), Frequency. 
5 7 26 110 
16 31 27 127 
17 148 28 133 
18 230 29 113 
19 326 30 96 
20 252 31 54 
21 237 32 44 
22 184 33 1l 
23 143 34 7 
24 115 35 2 

25 130 | 

TOTAL 2500 


A similar dissection of a complex distribution could be undertaken for the data of 
Table 1.13, showing a number of trypanosomes from the tsetse fly, Glossina morsitans, 
classified according to length. We are led to suspect here that the distribution is composed 
of the addition of several others (and this, by the way, has led to a suggestion that the 
trypanosomes are a mixture of distinct types). 


Frequency Functions and Distribution Functions 


1.16. The examples given above illustrate the remarkable fact that the majority 
of the frequency-distributions encountered in practice possess a high degree of regularity. 
The form of the frequency polygons and histograms above suggests, almost inevitably, 
that our data are approximations to distributions which can be specified by smooth curves 
and simple mathematical expressions. This approach to the concept of the frequency 
function, however, requires some care, particularly for continuous distributions. 

Consider in the first place a discontinuous distribution such as that of Table 1.4. Let 
us represent our variate by х. Then we may say that x can take any of the ten values 
0,1, . . . 9 and that the frequency of =, say f(x), is given by the table, that is to say, 
/(0) = 1026, f(1) = 1107, f(2) = 997, and so on. The frequency table, in fact, defines the 
frequency function. Furthermore, most of the frequencies in the table are approximately 
1,000, and we may then consider the observed distribution as approximating to that 

` defined by 
f(x) = 1000, 2= 0, 1, ... 9 
or, more generally, to the distribution 


fe) = 2=0,1,...9 (1.2) 


This is perhaps the simplest case of a discontinuous frequency function, f(e) being 
a constant for all permissible values of z. 


Eo de 2 NE 


4 


FREQUENCY FUNCTIONS AND DISTRIBUTION FUNCTIONS 13 


In Table 1.5 we have a discontinuous variate which can, theoretically, take an infinite 
number of values, namely, any one of the positive integers. In practice, of course, there 
must be a limit to the number of stigmatic rays which a poppy can possess, but since we 
do not know that limit we may imagine our variate as infinite in range. The frequency 
function for the table itself is again simply defined by the frequencies therein ; but if we 
wish to proceed to a conceptual generalisation of such a table we must admit a discontinuous 
function (ж) defined for all positive integral values of x. This occasions no difficulty 
provided that we are able to attach some meaning to the total frequency, ie. that 


X J(vj) converges. - 


ї=1 


1.17. Consider now the case of а continuous variate. In the ordinary data of 
experience our distributions are invariably discontinuous, because our measurements can 
only attain a certain degree of accuracy. For instance, we are accustomed to suppose 
that the height of a man may in reality be any real number of inches in a certain range, 
In fact, we can measure heights only to a certain accuracy, 


say 50 to 80, such as 207. 
Our measurements thus consist of whole numbers 


say to the nearest thousandth of an inch. 
(of thousandths) from 50,000 to 80,000, and such a number as 62,831.85 (= 20,0007 
approximately) cannot appear. All physical measurements are subject to this limitation, 
but we accept it and nevertheless speak of our variables as * continuous," the under- 
lying supposition being that the measurements are approximations to numbers which ean 


fall anywhere in the arithmetie continuum. 


1.18. With this understanding we can consider the distribution of grouped frequencies 
as leading to the concept of a frequency function for a continuous variate. If, in one 
of the distributions above, say that of Table 1.7, we were to subdivide the intervals, we 
should probably find that up to a point the resulting frequencies were smoother and smoother. 
The reader can verify the appearance of this effect for himself by grouping the data of 
Table 1.7 in intervals of 8, 4, and 2 inches. We cannot, however, take the process too 
far, because, with a finite population, continued subdivision of the interval would sooner 
or later result in irregular frequencies, there being only a few members in each interval. 
But we may suppose that for ranges Av, not too small, the distribution may be specified 
by a function f(x) Av, expressing that in the range + 21% centred at = the frequency is 
f(x) Ax, wherever x may be in the permissible range of the variate. We may suppose further 
that as Aæ tends to zero the population is perpetually replenished so as to prevent the 
occurrence of small and irregular frequencies ; and in this way we arrive at the concept 


of the frequency function for a continuous variable. We write 
ағ =] йкы — o. ГДЕ) 


expressing that the element of frequency dF between x — idv and x + dx is f(x) da, for 


all ж and for dx, however small. 

1.19. This admittedly somewhat intuitive approach to the concept of the continuous 
frequency-distribution appears to be the best for statistical purposes, and is certainly 
the way in which the concept was originally reached. In formulating the axioms and 

cal theory, however, the mathematician considers a 


postulates of a rigorous mathemati 
a aM. as yet no thorough formulation of the theory 


rather more general function. There is t - l 
required in this connection, and it would be alien to the primary purpose cf this book to 


14 A FREQUENCY-DISTRIBUTIONS 


attempt one, even if the space were available. We will merely indicate in broad outline 
the general approach. 


1.20. We consider a function F which is defined at every point in a continuous 
range and is continuous, except perhaps at a denumerable number of points. We require 
that F shall be zero at the lower point of the range (which may be — co) and a constant N 
at the upper point (which may be + со) and that it shall not decrease at any point. Such 
а function is called a Distribution Function. It corresponds to the cumulated frequency 

_of a frequency-distribution, N being the total frequency; for example, in Table 1.4, 
M(t) — 0 for «<0, F(z) = 1026 for 0 « z « 1, F(x) = 2133 (= 1026 +1107) for 
l«z-2,andsoon. Here there are ten points of discontinuity for F(x). These points 
are called “ saltuses ? (jumps) and F(x) in this case is called a Step Function. 

If there is no saltus in the range, F(x) is continuous and monotonically increasing. 
Uf it possesses a derivative we have the equation in differentials 

G - dF = F'(x) dx 
= f(x) dx г А 5 " í . (1.4) 

corresponding to (1.3). f(x) is called the Frequency Function. The 


mathematics of this 
branch of the subject is then that of the study of functions of 


the class F(x) and J (2). 


1.21. The functions as thus defined are m 
statistical approach in two ways: (i) F(x) 
range and then possess a saltus, i.e. the frequ 
suddenly discontinuous—in statistical practi 
tinuous, never both in different parts of the 
exist without there existing a frequency func 
necessarily possess a derivative. 
continuous variate will be accomp 

The function F(x) 
become evident in Cha: 

‘however, it has 
function ” only. 


ore general than those arrived at from the 
can increase monotonically in part of the 
ency may be continuous for a time and then 
ce a variate is either continuous or discon- 
range; (ii) where no saltus exists F(x) can 
tion, just as a continuous function need not 
In all the cases we shall consider, the existence of a 
anied by the existence of a frequency function. 

is sometimes called a Probability Function, for reasons which will 
pter 7 when we consider the theory of probability. Essentially, 
nothing to do with probability and we shall use the term “ distribution 


1.22. If the discontinuous frequency function is f(x), and F(x) is taken to be the 
total frequency less than or equal to x, we have 


Fe) = Уа) E + уш” ЧОЙ) 
In the continuous case 76 


F(x) = ү dr 


. -| Ха) dx Š * š 5 я + (1.6) 
» a 
where the range is а to b. We now introduce two conventions w 
sions to some extent. We shall suppose, unless the contrary 
mathematical expressions our frequencies are always expressed 
frequencies, so that the total frequency is unity and the sum 
range of the frequency function is also unity, ie. F(b) = 1. 


hich simplify these expres- 
is specified, that in these 
as proportions of the total 
or integral over the whole 
Secondly, to avoid the 


А 
ri 


STIELTJES INTEGRALS ~ i x 15 
^5 i 


constant specification of the limits a and b we may, without loss of generality, suppose that 
F(x) and f(x) are zero for any x less than a, and that F(x) = 1 and f(x) = 0 for any x 
greater than b. With this convention we may write À 


12 Ice 2 ОРА) 
F(x) = f(a) da 
and S Je) = (о) — К(— о) = | 
je = D * а (8) 


É flv) dx = (©) — F(— c) =1 


Where it is necessary to take account of the total frequency N we may do so by multiplying 
by N frequencies given by the frequency function. In our convention F(x) is always 


continuous on the left. 


Excursus on Stieltjes Integrals 
1.23. The distinction between discontinuous and continuous distributions, though 
real and important for statistical purposes, is something of a nuisance in mathematical 


investigations, and to avoid the necessity of stating all our theorems twice we shall use 


a type of integral due to Stieltjes. In effect, this integral subsumes under one summatory 


process the finite summation denoted by X and the ordinary integral denoted by f: 


Suppose, in fact, that F(x) is a distribution function as we have defined it. Let y(x) 
be a continuous function in the range of F(x), which we will take in the first instance to 
be finite, a to b. Divide the range into n intervals at points а = Vo, Xi, X5, . + » 2,1, 
v, = b, Take &, іп the range а to a, &, in the range vı to xa and so on. Let 

S = y(&)(F() — F@} + p(E2) (FG) — F1) 
+... ty) (0) Feo)... 2 Q9 
rvals z,,, — x, tends to zero uniformly, 


It may be shown that as the size of the inte 
ation of the points ё or of the boundary 


S tends to a limit which is independent of the loc 
Points of the intervals. We then write this limit 


[ve ar „^к “Se 


and define it as the Stieltjes integral of y(x) with respect to F(x). ; ' 
As for the case of ordinary integrals, we may now consider a and b as tending to infinity 


and write, for example Е 
f y(x) dF; ^ n 


provided that the limit exists. 
In particular, if у(х) = 1, we have t 


F(z) = i dF, 


he distribution function 


16 FREQUENCY-DISTRIBUTIONS 


1.24. If F(z) is the distribution function of a distribution possessing a continuous 
frequency function, the Stieltjes integral becomes the ordinary integral 


| w(x) f(x) dx, 


and thus includes ordinary integration as a particular case. If F(x) is the distribution 
function of a discontinuous distribution, that is to say, is a step function, a term such as 
F(v,.,) — F(z, ) will vanish unless there is a saltus in the range a, to z,,,. The sum S of 
(1.9) must then tend to the limit (since it does tend to a limit) X y(z,) f(z,), ie. to the 
ordinary summation of a series. The Stieltjes integral thus also includes such summation 
as a particular case. 


1.25. Many of the theorems of ordinary integration are true of the Stieltjes integral. 
We shall frequently require the following : 


b b 
|f var] | тав JL а 2 aH 
iva | a 


Ul 
« ul аР 


а 


< М « e (112) 
where M is the upper bound of (x) in the range (a, b). 
b b 
| рав = ve) ho учу . + е « Ж Ош 


where £ is a value of 2 in the range (а, b). 


If a and b are finite 
» = v3 (^ 
[X fear = 25 кй. 0. « шы] 
a^i a 


j=i 


provided that Zf;(x) converges uniformly in the range. The theorem is not necessarily 
true if a and b, or one of them, are infinite. 


The ordinary rules of partial integration are also applicable to Stieltjes integrals. 


Variate Transformations 


1.26. Suppose we have a new variate & related to x by some functional equaticn 


Wut а. ы os КИБ) 


£ being continuous and differentiable in x throughout the range of v, and vice-versa. We 
have then the equation in differentials 

EN ' 

dx = x' dé = ДЕ dé, . ^ - ` = (1.16) 


Consequently, for a continuous distribution 


F(x) = [ағ = TN fa) da 


=f fæ de dh 


D 


$? 


4 
U 


VARIATE TRANSFORMATIONS 17 


and consequently we may write the distribution as 


' lx 
aF = аа n i col eee Cal) 


expressing that an element of frequency between 5 — 1 dë and E + 4dé is feo} gs. 


The equation determining the frequency function may then be transformed as if it were 
an equation in differentials. Such transformations are important in the theory of con- 
tinuous distributions. By their means many mathematically specified distributions may 


be reduced to known forms, either exactly or approximately. 
For example, a distribution which we shall have to study in the theory of sampling is 


1 = 


dF = ety tdy — 0444 


It is readily verified by integration that (со) = 1. 


By the transformation x = į we reduce this to 
il cimi Е Р 
dF = e~“ di 0<i<cn 


ге) 


a well-known form in analysis, the distribution function being the incomplete J-function 


-1 


| gira dt 


"i 
- Qro 


Again, the distribution 
aR = E | —a<ti<o 


rT 


oH 
P? 


(y, being chosen so that F(co) = 1), a symmetrical peaked distribution of infinite range 
‘ther like that of Fig. 1.3, may, by the substitution of £ = ү? tan 0, be transformed 
nto Е 


_ yor/v sec? 0 d0 cf eq 
dF = sec’?! 9 2 > 


= y, v/v с08'710 40, 
a distribution of finite range — i to + P» but still symmetrical Putting now sin 0 = é, 


We have 
ар = оу — E)? di 
and again £2 — 2, 


ЕГ 0<2<1. 


a= yo (1 — «) 


4.8.— vor. i: 


18 FREQUENCY-DISTRIBUTIONS 


The effect on the range of this last substitution is to be noted. ё ranges from —1 to SI 
and as it does so z ranges from + 1 to 0 and back to + 1. The distribution function of 
the z-distribution, F(x) from 0 to x, is thus that of the E-distribution from — 8° to $°. 
Whenever substitutions are made under which there is not a (1,1) continuous relation 
between the variates, points such as this require some watching. 


1.27. There is one variate transformation which is worth special attention. In 
the distribution 
: dF = f(x) dx 


put =| f(x) dx. 
Then dF = fle) 020 
=I ge 
= д) 
= dé 0«£«l. ° . . » (1.18) 


so that the distribution is transformed into the very simple “ rectangular " form in which 
all values of the variate from 0 to 1 are equally frequent. 
can be transformed into the rectangular form ; and it follows that there exists at least 
one transformation which will transform any continuous frequency-distribution into any 
other continuous frequency-distribution, viz. the transformation which transforms one 


into the rectangular form coupled with the reverse of that which transforms the other into 
the rectangular form. 


The Genesis of Frequency-Distributions 


1.28. Up to this point we have not inquired into the origin of the various observed 
frequency-distributions which have been adduced in illustration. Certain of them may 
be considered apart from any question of origination from a larger population. The death 
distribution of Table 1.12 is an example; if we are interested only in the distribution of 


male deaths in England and Wales in 1930-32 the whole of the population under con- 


sideration is before us. 

But in the great majority of cases the population which we are able to examine is only 
part of a larger population on which our main interest is centred. The height distribution: 
of Table 1.7 is only a part of the population of men in the United Kingdom living at the 
time of the inquiry, and it is mainly of importance in the light of the information which 
it gives us about that population. Similarly the distribution of farms of Table 1.9 is largely 
of interest in the information it gives about costs of milk production for the whole country. 


1.29. In the two cases just mentioned, height and costs of milk production, we 
have information about a certain sample of individuals chosen from an existing population. 
Only lack of time and opportunity prevents us from examining the whole population. 
It sometimes happens, however, that we have data which do not emanate from a finite 


existent population in this way. Table 1.14 is an example. It shows the distribution of 
throws with dice, 


Any continuous distribution . 


MULTIVARIATE DISTRIBUTIONS 19 


TABLE 1.14 


Showing the Number of Successes (throws of 4, 5 or 6) with Throws of 12 Dice. 
(Weldon’s data, cited by Е. Y. Edgeworth, Encyclopædia Britannica, 11th ed., 22, 39.) 


Number of Number of 
Successes. теплу: Successes. Frequency. 
0 0 7 847 
1 7 8 536 
2 60 9 257 
3 198 10 71 
4 430 11 11 
5 731 12 0 
6 948 
TOTAL 4096 


Now it is clear that, in a sense, we have not in these data got a complete population, 
for we can add to them by further casting of the dice. But these further throws do not 
exist in the sense that the unexamined men of the United Kingdom or the unexamined 
dairy farms of England and Wales exist. They have a kind of hypothetical existence con- 
ferred on them by our notion of the throwing of the dice. 

Even distributions which appear at first sight to be existent may be considered in 
this light. Тае trypanosome distribution of Table 1.13, for instance, was obtained from 
certain tsetse flies. We may consider it as a sample of all the tsetse flies in existence, 
whether harbouring trypanosomes or not—an existent population; but we may also 
consider it as a sample of what the distribution would be if all the tsetse flies were infected 
With trypanosomes—a hypothetical population. SERM 

The population conceived of as parental to an observed distribution is fundamental 
to statistical inference. We shall take up this matter again in later chapters when we 
Consider the sampling problem. The point is mentioned here because it will occasionally 
arise before we reach that chapter. It must be emphasised that the distinction between 
existent and hypothetical universes is not merely a matter of ontological speculation—if 
it were we could safely ignore it—but one of practical importance when inferences are 


drawn about a population from a sample generated from it. 


Multivariate Distributions 

1.30. In the foregoing sections we have considered the members of a population 
according to a single variate, and the frequency-distributions may thus be called univariate. 
The Work may be rcadily generalised to include populations of members considered accord- 
Mg to two or more variates, yielding bivariate, trivariate . . . multivariate frequency 
distributions. Table 1.15, for example, shows the distribution of a number of beans 
*ccording to both length and breadth. The border frequencies show the univariate dis- 
tributions of the beans according to length and breadth separately, and the body of the 


able shows how the two qualities vary together. 


2 4 FREQUENCY-DISTRIBUTIONS 
TABLE 1.15 


Showing Frequencies of Beans with specified Lengths and Breadths. 
(Johannsen’s data, cited by S. J. Pretorius (1930), Biometrika, 22, 110.) 


Lengths in millimetres (central values). 


El 17 hes 16 5 | 13 |125| 12 (11-5 11 10-5) 10 | 9-5 | Torars. 
A WA l | Je 
2 | | | | Е 
са 941295 |-—4 9| — | | —|— 
3| 8:875 |4| 8| 17 | = = 48 
91 8625 | 2 | 23| 101 SS) == И 400 
с 8:918 = | 18 | 105 | | | — —|— 1483 
B| 8125 |—| 4| 44 12 3 | 742 
5| 7:875 | — | = m | 89 19 “к= | 2579 
Ё| 7-625 |—|—]| 1 175 | 55| 27| 4 Zil 
= 75 | — | —|— 124| 78| 37|22]11|—| 1 | — 530 
Fall! —|—|— | SSF 95] Е art e gL — 170 
6-8 = Ө Saf Ll ло ЕЕ T | do 72 
E| 66 1— | es 2 | 1| 4) 5 1 — 10 
тү зо с=т — | | | x | —|1|1|1 4 
| Тота1ѕ | 6 | 55 | 275 | 1129 | 2082 Bue DN] 929 | 437 | 199 | 115 | 70 36 | 18| 7 | 1 | омо 
i | 


As for the univariate case, the variates may be discontinuous or continuous and we some- 
times meet cases in which one variate is of one kind and one of the other. 


1.31. In generalisation of the frequency polygon and the hist 
3-dimensional figures to represent the bivariate distribution. 
containing a pair of perpendicular axes and ruled like 
lines being drawn at points corresponding to the term 


ogram we may construct 
Imagine a horizontal plane 
a chessboard into cells, the ruled 
inal points of class-intervals, At 


Fia, 1.4. Bivariate Histogram of the Data of Tablo 1.15, 


INDEPENDENCE 2 


the centre of each interval we erect a vertical line proportional in length to the frequency 
in that interval. The summits of these verticals are joined, each to the four summits of 
verticals in the neighbouring cells possessing the same values of one or the other variate. 
The resulting figure is the bivariate frequency polygon or Stereogram. 

s Similarly we may erect on each cell a pillar proportional in volume to the frequency 
in that cell and thus obtain a bivariate histogram. Fig. 1.4 shows such a figure for the 


bean data of Table 1.15. 


1.32. We may write the bivariate distribution with variates vı, x2, as 
dF = f(a, 2) dx, dz; Y 2 : : - (1.19) 
With the usual conventions we shall then have for the bivariate distribution function 


F(x, 22) _| ы ТОТ) 


this integral also being understood in the Stieltjes sense, reducing to ordinary integration 
if f(v,, ж») is continuous and to ordinary summation if it is discontinuous. 


Independence 
1.33. If there are two distribution functions F,, Fa, such that 
Pay, v) = Fila) Р.) . s ; С 5. (190) 
then v, and æ, are said to be independent. Where frequency functions exist we have 


Jes, ж.) = iQ) fie: 
It is readily seen that this definition of statistical independence conforms to the colloquial 
use of the word and also to its mathematical use. The distribution of x, for any fixed x; 
(e.g. the distribution in a row or column of the bivariate frequency table) is the same what- 
ever the fixed value of ау, that is to say, the distribution of ay is independent of a. ; 
Two variates which are not independent are said to be dependent. Evidently those 
9f Table 1.15 are dependent, for the distributions in rows or in columns are far from similar, 
Generally, n variates are independent if . 
Flay... %) = Р)... Fo). 
ariate for bi- and multivariate distributions follow 


giving 
(1.22 


1.34. Transformations of the v ' i 
Mie Ordinary laws for the transformation of differentials. For example, if 
dF = f(x, аа) dx, dos 


ж, = Us £j) з = Tols Ea) 


we have dF = [{ал\(& ёз), TEn Ea) M. а, d£. а . (1.23) 
Where J is tie Jacobian : 
дж, дт, | 
у |202] — a 
a(r ЕЭ) да 02, 


а i H . 
ВО Soto is, taken with а positive sign in (1.23). 


=| оэ "ATA 220090299 


22 FREQUENCY. -DISTRIBUTIONS 


Consider, for example, the distribution 


Gun Zo exp { Р БЕ ct 2px,v, 25 5) Мей, — 0 2,2, $ 0. (1.24) 


oi 0103 02 


z, as usual, being chosen so that the total frequency is unity. The variates are evidently 
dependent. 


© 21 рз 
Put Ё = ae 
Ta 
а = (1 — pia 
& = (1 — p?) 2 
We have 
а 2 2 
д(ё\, Èa) at СД Oz m (1 — p? 
0(2;, E29] | 0 (1 Le Pct eed 0,03 
оз 
- and 21 Ptt | aj 


= 5 = 21 63 
сї 0105 05 


The distribution then becomes 


aP = re (M+ Died... (125) 
E бо" dé, e-iSd&, , т Ж e (1.26) 


The transformed variates £, and £, are thus independent. 


NOTES AND REFERENCES 


The collection of definitions of statistics by Willcox (1935) has already been referred 
to in the text. 

Examples of practical fre 
particularly Biometrika. 


As to the mathematical basis of the theory of frequency-distributions, there appears 
to be no account in English. The reader who is interested should, however, make a point 
of reading two French works, that by Lévy and those by Fréchet in the Borel Traité. Both 
these are written from the standpoint of the theory of probability, but the basic ideas of 
the theory of frequency-distributions are the same whether probability is concerned or not, 


quency-distributions will be found in most statistical journals, 


Borel, E., Traité du Calcul des Probabilités, Gauthier-Villars, Paris. 
written under the general editorship of M. Borel. 
M. Fréchet called “ Nouveaux Recherches." 

Lévy, P., Calcul des Probabilités, Gauthier-Villars, Paris. 

Shohat, J. (1929), “ Stieltjes Integrals in Mathematical Statistics," Ann. Math. Statist., 
1,78; 

Willcox, W. Е. (1935) 


A series of brochures 
See particularly the two by 


‚ “ Definitions of Statistics,” Revue de l Inst. Int. de Statistique, 3, 388. 


EXERCISES * 23 


EXERCISES 


1.1. Draw frequency polygons or histograms of the following distributions :— 


TABLE 1.16 
Frequency-Distribution of Successes in Twelve Dice thrown 4096 Times, а Throw of 6 Points 
reckoned as a Success. 


(Weldon’s data ; loc. cit., Table 1.14.) 


Number of Successes . - | 0 1 2 3 4 5 6 | 7 and over TOTAL. 


Number of Throws 447 1145 | 1181 | 796 380 115 24 8 Д 
TABLE 1.17 
the Food, Drink and Tobacco Trades of Great 


Frequency-Distribution of Size of Firms in 
Britain. 


1930, Part IIT. The table shows the 


(Final Report of the Fourth Census of Production, 
certain numbers of persons.) 


number of firms employing, on an average, 


. | { 

Size of Firm (Aver- u-sdos-49 50-99100- |200- |300- laoo- (ооо 1750- 1000- 1500 |Torar. 
ago Numbers Em- 199 299 399 499 749, 999, 1499 and over 
ployed). | | | | | 

‚36 | 54 | 31 | 23 | 29 5316 
| 


| 
Numbor of Firms . | 2245| 1449| 771 | 439 164 | n 
ecu Ж! 


TABLE 1.18 


Frequency- Distribution of Plots according to Yield of Grain in Pounds from Plots of sh oth 
Acre in a Wheat Field. 


(Mercer and Hall (1911), Jour. Agr. Science, 4, 107.) 


|as 40 а2 |44 46 | 4:8 | 5-0 | 5:2 | Toran. 


Yield of Grain.in pounds |28 |30132 |3 |36 
per її Acre. (Central [ | 
| 


value of range). 
a0 | av | 8 | 78 gs | 69 | 50 | 35| 10 | 8 | А 500 


Number of Plots . - il 4 | 


24 FREQUENCY-DISTRIBUTIONS 


TABLE 1.19 


The Percentages of Deaf-mutes among Children of Parents One of whom at least was a Deaf- 
mute, for Marriages producing Five Children or More. 


(Compiled by б. Udny Yule from material in M arriages of the Deaf in America, ed. E. A. 


Fay, Volta Bureau, Washington, 1898. Where a family fell on the border line between 


two class-intervals one-half was assigned to each.) 


| zl 
Percentage of Number of Percentage of Number of 


> Deaf-mutes. Families. Deaf-mutes. Families. 
0-20 220 60-80 5:5 
å А 20-40 20:5 80-100 15 
4 40-60 12 = 
| | TOTAL 273 


TABLE 1.20 


Showing the Frequency-Distribution 


of Fecundity, i.e. the Ratio of the Number of Yearling 
Foals produced to the Number of Cov 


erings, for Brood-mares (Racehorses) covered Bight Times 
at least. 
(Pearson, Lee and Moore (1899), Phil. Trans., A, 192, 303. 


Where a case fell on the border 
between two intervals, one-h 


alf was assigned to each.) 


| Е! 
Number of Mares Number of Mares 
i with Fecundity ^ with Fecundity 
Fecundity. between the Fecundity. between the 
Given Limits. Given Limits. 
1/30- 3/30 2 17/30-19/30 315 
3/30- 5/30 1-5 19/30-21/20 337 
5/30- 7/30 11-5 21/30-23/30 293-5 
7/30- 9/30 21-5 23/30-25/30 204 
9/30-11/30 55 25/30-27/30 127 
11/30-13/30 104-5 27/30-29/30 49 
13/30-15/30 182 29/30-1 19 
15/30-17/30 271-5 * 2 
TOTAL 2000-0 


У 


| EXERCISES * OF 


TABLE 1.21 


Showing Numbers of Sentences of given Lengths in Passages from Macaulay's Essays on 
Bacon and on Chatham. 


(From G. Udny Yule (1939), Biometrika, 30, 363.) 


Length of Sentence | Number of Length of Sentence Number of 
in Words. | Sentences. in Words. Sentences. 
1-5 46 66- 2 
6- 204 7l- 4 
ll- 252 76- 8 
16- 200 8l- 2 
21- | 186 86- 2 
26- 108 91- 1 | 
i 31- 61 96- 2 
36- 68 101- 1 
41- 38 106- = 
46— 24 111- 1 | 
51- 20 116- = ! 
56- 12 121- 1 
6l- 8 
TOTAL 1251 


TABLE 1.22 


Showing ihe Numbers of Old Egyptian Skulls with Specified Lengths of the Left Occipital 
Bone in millimetres. 


(From T. L. Woo (1930), Biometrika, 22, 324.) 


! I 
| Length E 
Тепа Frequency. EE ise Frequency. 
(central values). | 1 y (central values). |. 
| —- — 
| 9.5 74 
ui s im T 
86-5 | 12 1-5 É 
2 106-5 3 
is 8 108-5 18 
90-5 48 = 
79 110-5 
т 116 112-5 . 4 
96-5 | 104 1145 4 
l 6 116-5 чы 
“л. 98:5 Н 126 Tine е 
© 100-5 123 
$ ^ TOTAL 864 
Б 


26 FREQUENCY-DISTRIBUTIONS 


TABLE 1.23 
Showing the Number of Women Aborting at Specified Term in Weeks. p 
(From T. V. Pearce (1930), Biometrika, 22, 250.) 
antes Frequency. re d Frequency. 
4 3 17 13 
5 ү! 18 14 
6 10 19 8 
7 13 20 4 
8 14 21 2 
9 29 22 10 
10 22 23 4 
11 21 24 4 
12 18 25 3 
13 28 26 4 
14 16 27 6 SR 
15 19 28 1 ay 
16 10 PES 
Toran 283 


1.2. Sketch the following curves and com 


pare their shapes with those of the 
distributions in the previous exercise :— ў 


у = уе ? — 0 «zz o 
y =ye 7 l&zr« о 
y = yor? y>1,0 <ra o 

‚ Уо 
7 Uc п> 0, — ос<х< о 
у = у0(1 — х)? 4,6>10<2<1 
= ye "x? y>1,0<2< о 
Y= Yoh sse @=0, —l«r«l. 


1.3. Show that the following distributions can all be transformed into the type 
dF = y,(1 — ayP-1z1-1 gy 


Ocrzl 
and find the transformations : 
n—4 

direc ri eet qe =l<er<] 

t 
ero Кн — co «l« o 

1 >т - 

206" 

аР = af 1502 — 0 <2< o 


(me? +) 2 
(All these distributions are important in statistical ¢ 


heory. The distribution to which 
they are reduced is called the Type I or B-distributi 


on.) 


eS 


EXERCISES ' 
1.4. Sketch the stereograms or bivariate histograms of the following distributions: 


TABLE 


1.24 


27 


Number of Families deficient in Room Space in 95 crowded London Wards. 


(Census of 1931, Housing Report, p. xxxii.) 


Standard Room Requirement (Rooms). 


Families deficient by 2 | 3 4 5 6 7 8 Torars. 
1 room 12,999 | 18,198 7,124 | 2,170 164 19 41,274 
2 rooms 3,054 4,479 1,448 221 15 9,218 
3 rooms | 310 508 106 4 929 
4 rooms | 10 21 4 35 
Torats | 12,999 | 21,252 | 12,513 | 4,136 512 42 51,456 

TABLE 1.25 


Number of Cows Distributed according to (1) Age in Years and (2) Yield of Milk per Week 
in 4912 Ayrshire Cows. ; 


(Data from J. F. Tocher (1928), Biometrika, 20B, 100.) 


(1) Age in Years. 


A з [а [з [еј [е |е (20-а пауз ES ES р 
5 8 E рн eeu | ба ПЕЧЕ ЕУ ЕУ Е Е 1 
5 9 == 2 жш | |—|—р—|—|—|—[—[ ele 5 
n x 3 | == ү | SS SS i — A 
а 10 S 5| 1 1) Si s : 
ol 11 ЧСС | | tl c) hee катту не mm] у 
5 12 Б bel av] a| УО Ае Б TES m 7" 
s| эз oi ve] oof 18] Е Ч cm | ie 
=| H iil zel 571 38) 23] 28) т врме 
В| 15 Thais | | аз 3] 2] 11] 8] ponin e RU E EE EDS 
Е 16 15 | 149 | 119| 74| 59) 23| 23| 16 Oi), 2 4 | Еа o E e 499 
$| 1 16 | tas | 1st | 94| бв] 34} за 15 раво 
^| 18 11 | 146 | 132| 921 73| 49| 30) 22| 17| 6| 5| 2 1 = Е ees 
aj 19 10 | 117 | 112 | 113 | 87] 51| 35} 33] 11/10) 2) 3 £s ss 1| 586 
g| 20 o mr tor | vo| e| e| 25| 30| 35 |10| 2| 8) || E | enn 
SESS HE БАКЕ Л БЕШКЕ Ыис е = Ex EE 
3| 9 31 Жү | 49]. 4б] 32| 14. ЕО ЕЕЕ md m 284 
S| 23 1| тө] a8) 38) 38) зултат 1 2 SE a 
*| 2 $| бо! $3]-32| 27] 19| 18] este By) з = |—= || = 5 
E 29 SL xb Te за 17 | 202.958.] M03 бун TE | oen SSS i 
EL 5 1I ЕТИКЕ ЕЕ ana 5 
Pts» || жк шр Бү а ааа куна d ани тй. 
ИЕ ОЗЕ Иа 1з 
=] 20 = Чү s Ла XE. = [|| == == 5 
^ 30 | | s | 5 = : 
ч 81 ty, Жи 2 Mem | z5 9 f-— = 5 
а=] 32 => = E 2 i = xd : 
S| 33 | 1 
2] 34 = Em m iE | = me Ред A 1 

E 75 5 9 
=| Torazs | 112 | 1129 1047| 812 | 636 | 419 | 276 | 223 | 122 | 75 32|15| 7] 2| 4| 1| 4912 


FREQUENCY-DISTRIBUTIONS 


TABLE 1.26 


e 


Distribution of Weekly Returns according to (1) Call Discount Rates and (2) Percentage of 
~ Reserves on Deposits in New York Associated Banks. 
(From Statistical Studies in the New York Money Market, by J. P. Norton. 
of the Department of the Social Sciences, Yale University ; The Macmillan Co., 
that, after the column headed 8 per cent., blank columns have been omitted t 


Publications 
1902.) Note 


О save space. 
(1y Call Discount Rates, 


1|rs5|2 |25! 3 |35| 4 |45| 5 |55! 6 [65 7 |7518 | 9 |10112 [15 |20 |25 Тотліз 
аа ЕЕЕ 1| = Em | ° 
2E (2 = k=] 1 
Z| 23 = Sl SS SS e SS ss г nd ee el ns 1 
$| 24 = | — | 1 | 2 3 2 = 1 9 
Өү 25 |—|—|—|—|—| 1| 2| e| 4| alan Lt ELI—|o|zar2|i iij a9 
Биби 286 E О Е ЕТ" рр 1.91 1091.1 1] 1| 2i 85 
$| 27 |—| 1) 10) 911412 | 15 [17| 19| 9| 9| 3] 4 jj Ems et 1 124 
age ies. 8023120100) 73) 7| 1| 21 9| 3 zx es ee IL s 116 
E зи в АВ 151364 31. вда 11| | | = 1—11 1l 109 
celle BBD. т TS St aa eset о pee orm m —|—|—|—|—|—|--| з 
EA E ssim 6) 2) A 24 9) af 1| SS all + 
2e ee 10) зБ 1) —) — — —|—|—|— —| 53 
ж ЗК ЕВ 4h Lj TOREM =. ee s | eats —|—|—|—|—|—|— 32 
teqq 3 —|—|— — = — = = 14 
B ШЗ ШЕКТЕ | — —|—|—! = = —|—|—-|—|-|—I—| 14 
ё) 36 пол | [== — е == ра 0—16) a5 
BESTAND SEL |= = = = 9 
`| 38 олу 1|—c|—|—|— Shee беш ише |S | SS ES EE Se Sn 
= 39 ЕЛА! —|— |= — — — || 91 
S| 40 718. — | 1 | = ==) MES |—|—|—|—|—|—| 15 
5| 41 7| 3|—|—|— = зүр ш кж Е Б Бай бн Бер СЕ —— ШШ шыс Шы т 10 
| 42 8) 2|—|—|—| I—1 |— — — —|—|—|—|—|—| 10 
e| 43 MES РЕ E seu ee IL 
є а к сг es ae lle E 
IA 05 эс m nd in d a 
45 2 | = == al md al m a 9 
Torars| 121 93 |125| 70 69 | 40 | 52 45 | 52 | 20 | 35 | 10 | 18 | | 10 | 4 тр 1| 4| 780 
1.5. Show that the conditions that the function 


JG, x) = z, exp {Az} + 2Hayx, + Bast, 


KOK UJ; ty < 00 


may represent a frequency function are 


(à) A<o 
(0) В<0 > 
() АВ ньо. 


Show further that if these conditions 


are satisfied and the int 
— and co for both variates is unit 


egral of f(v,, a) between 
y, then 
_l|l-A H|} 
20 = = | H a 


CHAPTER 2 
MEASURES OF LOCATION AND DISPERSION 


2.1. It has been seen in Chapter 1 that the frequency-distributions occurring in 
statistical practice vary considerably in general nature. Some are finite in range and 
some are not. Some are symmetrical and some markedly skew. Some present only 
a single maximum and others present several. Amid this variety we may, however, 
discern four general types: (a) the symmetrical distribution with a single maximum, 
such as that of Table 1.7; (b) the asymmetrical distribution, or skew distribution, with 
a single maximum, such as those of Tables 1.8 and 1.9; (c) the extremely skew, or J-shaped, 
distribution, such as that of Table 1.2; and (d) the U-shaped distribution, such as that 
of Table 1.11. To make this classification comprehensive we should have to add a fifth 
class comprising the miscellaneous distributions not falling into the other four. 

The distributions with a single maximum will hereafter be called “ unimodal.” The 
synonymous terms “ cocked-hat,” “ single-humped ” and one or two others also occur in 


Statistical literature. 


2.2. It frequently happens in statistical work that we have to compare two distri- 


butions. If one is unimodal and the other J -shaped or multimodal a concise comparison 


is clearly difficult to make, and in such a case it would probably be necessary to specify 


both distributions completely. But if both are of the same type (and it is in such cases 
that comparisons most frequently arise) we may be able to make a satisfactory comparison 
merely by examining their principal characteristics; e.g. if both are unimodal it might 
be sufficient to compare (a) the whereabouts of some central value, such as the maximum 
—this, as it were, locates the distributions ; (b) the degree of scatter about this value 
—the dispersion; and (c) the extent to which the distributions deviate from the sym- 


metrical—the skewness. f | 
hen our distributions are specified by some mathematical 


The same point emerges W. { 
function, If, for example, we have two distributions of the type 
(2— т)? 


dF =ye— * de, 
symmetrical about 2 — m, a complete comparison can be made by comparing the value 
of the constants m and v in the distributions. Such constants are called parameters of 
the distribution. This chapter is devoted to a discussion of parameters of location and 


ISpersion, 


Measures of Location: the Arithmetic Mean 
2.3. There are three groups of measures of location in common use : the means (arith- 
Metic, geometric and harmonic), the median and the mode. We consider them in turn. 
The arithmetic mean is perhaps the most generally used statistical measure, and in 
fact is far older than the science of statistics itself. Tf the proportional frequency of the 
Values x of a distribution is f(x), the arithmetic mean ý, about the point а = a 15 defined by 


а = F, (x — a)fix) dx 


= ie (x — a) AF . . . : a 
29 


30 : MEASURES OF LOCATION 


This integral is to be understood in the Stieltjes sense and hence includes summation in 
the discontinuous case ; e.g. the arithmetic mean of a set of discrete values « is their sum 
divided by the number of values. In formula (2.1) the frequency, in accordance with 


our usual convention, is expressed as a proportion of the total frequency. If the actual 
frequencies are g(x), totalling N, we have 


3 queo 
A= ral (% — a)g(x) dx 

in the continuous case, and 
WES 
1 => 2, (z — a)g(z;) 

Hı N E 

in the discontinuous case. The value of the arithm 
of a, the point from which it is measure 
the integral (2.1) 


etic mean thus depends on the value 
d. For a mathematically specified distribution 
need not necessarily converge, in which case no arithmetic mean exists. 


2.4. The calculation of the arithmetic mean of a numerically specified distribution 
(i.e. one whose frequency-distribution is given in the form of a numerical table like those 
of Chapter 1) is a simple process. If there are relatively few values in the population 
we merely sum them and divide by their total number N. Tf they are given in the form 
of a frequency table a more formal procedure is desirable, but the principle is exactly the 
same. The following example will make the process clear, 


Example 2.1 


To calculate the arithmetic mean 
height of Table 1.7. 


Let us note first of all that if b is some other arbitrary vari 


иу (about а) = f (x — a) dF 


of the population of males distributed according to 


ate-value, 


=f ева | одар 
= щі (about Б) +b—a . $ А (2.2) 


In other words, we can find the mean about any point very simply when we know 
the mean about any other. In calculating the arithmetic mean we can then take an 
arbitrary point as origin and transfer to any other desired point afterwards. 
convenient to choose this arbi 


One further point arises i 
variate-values of the individuals within a certain class range. 
concentrated at the centre of the interval. 
will be considered in Chapter 3. 
mean in the case when the freque: 


We therefore assume them 
Corrections for any distortion thus introduced 
In fact, no correction is required for the arithmetic 
ney "tails off " at both ends of the distribution, 

In the particular case before us we take an arbitrary origin at the centre of the interval 
67— inches, i.e. at the point 677. inches, and measure &(= x — a) from that point. Column 2 
in Table 2.1 shows the frequency, column 3 the value of € and column 4 the value of ef. 
We find, having due regard to sign, 

S(Ef) = 8763 — 8584 = 179. 


i 179 
Hence the mean about x = 0 is 675+ —— 


7 А 
3585 = 67-46 inches, 


THE ARITHMETIC MEAN 31 


TABLE 2.1 


Calculation of the Arithmetic Mean for the Distribution of Table 1.7. 


(1) (2) (3) (4) 
Deviation 
Height, Frequency from Arbitrary Product 
inches. : Value &f. 
é. 
57- 2 — 10 20 
58- 4 — 9 36 
59- 14 — 8 112 
60— 4l — 7 287 
6l- 83 — 6 498 
62- 169 — 5 845 
63- 394 = 4 1576 
64— 669 — 3 2007 
65- 990 = 2 1980 
66- 1223 - 1 - 1228 
67- 1329 0 — 8584 
„== 
i 68- 1230 Td 1230 
i 69- 1063 T2 2126 
70- 646 + 3 1938 
| 71- 392 + 4 1568 
72- 202 + 5 1010 
73- 79 + 6 474 
74- 32 + 7 224 
75- 16 + 8 128 
76- 5 + 9 45 
77- 2 | +10 20 
TOTALS 8585 = + 8763 
| eo aa ЫЕ _ 2 


Example 2.2 


For a distribution specified b 
Mean is a matter of evaluating the.integral (2.1), 


© mean of the distribution 


уа mathematical function, the determination of the 
when it exists. For instance, to find 


att et ate 0<a<l 
dF = Bo, 7 xP 2 a 
We have 
Ё 1 а AS р-1 20 1: 

ду = Beale" х)! at dx 
B(p,q +) _ Pea + 1) Tip +9 
М - mn ^ fe Fat) ТОГО 
y | M © 
| pcd 


P. 


32 MEASURES OF LOCATION 


2.5. Apart from its relative simplicity and ease of calculation, qualities which ensure 
it a firm place in the elementary theory of statistics, the arithmetic mean has a number 
of properties which make it equally important in advanced theory. For instance :— 

(a) If in (2.1) we take a equal to x, itself the mean vanishes and consequently the 
sum over the population of deviations from the arithmetic mean i 


s zero. 

(b) The mean of a sum is the sum of the means ; i.e. i£ f, f. ...f,arethe frequency 
functions of т distributions with means uj, v, . . . pj, and if the sum of the frequency 
functions is g with mean 6’, then 


0) = p (= — a) g(x) dx 
=| еә f +... + pde 


= | œ- afiado +] (@ —alfiayde +... + | (= — а), (2) dz 
= m + + uos epp. 
(c) We shall see later that mean values are important in the theory of sampling, 


mainly in virtue of their mathematical tractabilit ^ but also because in a certain sense 
the mean is the best measure of location of some distributions. 


The Geometric Mean and the Harmonic Mean 
2.6. Two other types of mean are in use in elementary statistics, though they are 
not of importance in advanced theory. 
The geometric mean of N variate-values is the N 


" А 1 th root of their product and is not used 
if any of the variate-values are negative. 


For proportional frequencies J(c) we have 
G= П (x/i) 
j-2—- 
e . . . H Ы 2.3 
ог 1050 = x in 


„5 filega; 
ў=— 
» totalling N, 


G= Mai) ] 


and for actual frequencies g(x) 


1 
log G = 291959; 


t The harmonic mean of N variate-values is th 
their reciprocals, In the usual notation 


TOG 


e reciprocal of the arithmetic mean of 


A Э . (2.5) 
or, for actual frequencies, 
1 1” g(x) dx 
HU xl а " x d . © (2.0) 
Example 2.3 2и 
To find the geometric and harmonie means of the distribution 
1 
ea eds оса 
we have log G = 


1 1 
yg) 0 — 2°" 2 log eds, 


L4 
$F 


THE GEOMETRIC MEAN AND HARMONIC MEAN 33 
Now, since by definition 


. 1 > 
| (1 — x)? at! de = Bip, q) 
0 
we have, differentiating both sides with respect to g, an operation which is legiti- 
mate in virtue of the uniform convergence of the integral and the existence of the resulting 
expressions, 
1 д 
fa- ж)? gt! log x da = 920° 0). 
IE 
Thus log G.— ——— —B(p, q) 
^" Bp, q) 9g 
= 9 оь 10) (0) 


2 04 Ip +g) 
= 2 (log ги) — log Гр +0). 
The harmonic mean is given by 


boo Lo (ta asian 
a Heg = 
. В(р,9 — 1) I(qqg—1) Гр+Ф 
" B(p»g Гр+а-1) ГӘ 


SEE 
nix 297 
so that FF pe аа 
2+9=1 
We may note that the arithmetic mean, $ = 7 is greater than the harmonie mean, for 
d wk p. cl d. e qe SUP 
p+a | pt@pte-1 Бе ар 
and therefore „>Н 
if D c 


Which is- clearly so. 


v7 In general it may be shown that for distributions in which the variate-values 


are not negative (2.7) 


BEGG « © c On eee 
Consider in fact the quantity 
EQ) =| gei + + +. RR) 


is i i ing function 
hall show that this is an increasing 
icy lities may be replaced by 


d. 
t 


Where the дг, ae bars 

ws are positive numbers. 8 : 

9f &, ie E(t) > Ei) if t, >t. Asa trivial case these inequa 
i 2 2 


equalities, namely if all the 2’s are equal. We have 
1 
d = d 1 (5 жег 
457 T og N 
log txt 4 44 tog Ew. 
= — 210 ў а + 1 di 5 
A.S, — узу, I. 


34 MEASURES OF LOCATION » 
Hence, for the function i 
= 12 ] 
Un d og E 
we have 


dF d а[а 
E e p АЩ pum log Zx! 
am amen log Sa! + alin og } 


= pple log? a) Ze!) — (оца) . o a LS 
PUE 

Now in virtue of Schwarz’s inequality D(a*)3(b2) > {Z(ab)}? the expression in brackets is 

not negative. Hence E has the sign of ¢ and F thus has a minimum at і = 0. But 


d ә ; 
when ¢ = 0, F = 0 and thus F must be non-negative. Therefore zn log E is non-negative, 


and since Z is positive = is non-negative and thus E is a non-decreasing function. 
Now in Z(t), 
harmonic mean 


when t = 1 we have the arithmetic mean; when ¢ = — 1 we have the 
; and when (—- 0 we have the geometric mean, for 


log wile) 


lim log E = lim —- 1 
1—0 
= lim = log x 
= 1 n jA 
m2 log x. 


Hence the inequality (2.7) follows. 
For simplicity we have stated these results for the discontinuous variate. 
however, is easily seen to remain true for Stieltjes integrals and hene 
Hereafter when the “ mean ” is mentioned without qualification, 
is to be understood, 


The analysis, 
e is generally valid. 
the arithmetic mean 


The Median 


2.8. The median valu 


e is that value of the variate which diy 
into two equal halves, i.e. 


ides the total fre 
is the value „u, such that чолу; 


|" dx = [fo dx =}, 


There is some small indeterminacy in this definition when the 


which may be removed by convention. If there ате (2N + 1) members of the Population, 
we take the median to be the value of the (N --1)th member. If there are 2V we take it 
to be halfway between the values of the Nth and the (N + D)th. When the distribution 
is numerically specified in class-intervals there is the usual indeterminacy due to grouping, 
which may be dealt with in the manner of the following example. 


distribution is discontin 


С) 


ААЙ 


» MEDIAN AND MODE 35 


Example 2.4 К 
To find the median value of the distribution of heights considered in Example 2.1. 
Half the total frequency of 8585 observations is 4292-5, 
There are, up to and including the interval, 6615 inches 3589 
leaving 703-5 
The frequency in the next interval is 1329 
Hence we take the median to be 


6615 + 


703.5 
1329 


The mean (Example 2.1) is 67-46 inches, practically the same. 
A graphical method of determining the median is given later in this chapter (2.13). 


= 67:47 inches. 


The Mode i 

2.9. The mode or modal value is that value of the variate exhibited by the greatest 
number of members of the distribution. If thè frequency function is continuous and 
differentiable it is the solution of 


d'e)- 1f) =0, f(z) = ш <0 КЕ ЖАШОО (PE) 


If /'(ж) vanishes and f"(v) is greater than zero we have а minimum, and such a point is 
sometimes called an Antimode. 4 | an 

In numerically specified distributions and discontinuous distributions generally the 
mode is sometimes difficult to determine exactly. It is essentially a concept related to 
the continuous frequency function. For example, if the distribution merely consists of 
an isolated number of values, each of which occurs only once, there is no mode in the 
sense defined above. Where, however, the number is large enough to permit grouping, 
there will usually be an interval containing a maximum frequency, and we may regard 
the mode as lying in that interval. More generally there may be several maxima, in 
which case the distribution is multimodal. In the height distribution of Table 1.7, for 
instance, the mode may be considered as lying somewhere in the interval 67— inches. 
To estimate its position more accurately it is necessary to fit a ацан curve to the 
distribution and determine the mode of the curve. The process of fitting will be considered 


in Chapter 6. 
bution. the mean, the median and the mode (or in 


Cases such as the U-shaped distribution, the antimode) coincide. For skew distributions 
they differ. "There is an interesting empirical relationship botean he three quantities 
Which appears to hold for unimodal curves of moderate asymmetry, namely 

Mean — Mode = 3 (Mean — Median). 3 Й (2\1) 
ationship has been given by Doodson (1917). 
hat the mean, median and mode occur in the same 
Order (or the reverse order) as in the dictionary ; and that the median is nearer to the mean 


than to th le. just as the corresponding words are nearer together m the dictionary. 
e mode, just 4 and the mode have considerable claims to use as 


In el ; the median Н à ne) 4 
Measure wer) iride are readily interpretable in terms of ordinary ideas—the 
8 of location. y mode is the most popular value—and the median 


Median > d 1 

Ў ist alue and the 3 i і 5 Me cus 

is Usually the ie: priat than the mean in numerically specified distributions. 
more easily de ; 


2.10. In a symmetrical distri 


A mathematical explanation of this rel 
It is a useful mnemonic to observe t 


36 MEASURES OF LOCATION n 


What gives the arithmetic mean the greater importance in advanced theory is its superior 
mathematical tractability and certain sampling properties; but the median has com- 
pensating advantages—it is, for instance, less dependent on the scale and the form of the 
frequency-distribution than the mean—and it seems to deserve more consideration in the 
advanced theory than it has received. 


Quantiles 


2.11. The concept of median value can be easily extended to locate the curve more 
accurately by the use of several parameters. We may, for example, find the three variate- 
values which divide the total frequency into four equal parts. The middle one of these 
will be the median itself ; the other two are called the lower and upper quartiles respectively. 
Similarly, we may find the nine variate-values which divide the total frequency into ten 
equal parts—the deciles. Generally we may find the (n — 1) variate-values which divide 
the total frequency into n equal parts—the quantiles. Evidently the knowledge of the 
quantiles for some fairly high n, such as 10, gives a very good idea of the general form 
of the frequency-distribution. Even the quartiles and the median are valuable general 
guides. > : 


2.12. The determination of the quantiles of а numerically specified distribution 
proceeds as for the median, indeterminacies being resolved by the usual conventions. 
That of the quantiles of a mathematically specified distribution, say the jth quantile, 


is a matter of solving the equation 
Jal uw 2.12 
= a . . D . z . (2.12) 
which can be done without difficulty by interpolation when the integral of dF has been 
tabulated. ` 


Example 2.5 
To find the quartiles of the height distribution considered in Example 2.1. 

One-quarter of the total frequency is 8585/4 = 2146-25 

Up to the interval 65— there are 1376 members 
leaving 770-25 members 

In the next interval there are 990 members 

«95 
Thus the lower quartile is 6415 + Teo = 65-71 inches 
The upper quartile will be found to be 69-21 inches 


We have already found (Example 2.4) that the median is 67-47 inches 
Denoting the quartiles by Q, and Q, we see that 


Qı — He = — 1:76 inches 

Qs — Me = 1-74 inches 
so that the median is almost half-way between the quartiles, an indication of the symmetry 
of the distribution. 


The Distribution Curve or Ogive of Galton 


‚213. The quantiles may also be determined graphically. Suppose we plot v, the 
variate, along a horizontal axis and X f(x), the cumulated frequency up to and including 


—— 


THE DISTRIBUTION CURVE 37 


90 T I 


Cumulated Frequency (thousands) 


ы 
© 


10 


16 18 20 22 24 


2 2 6 8 10 12 D 
Annual Income (£000) 
Fic. 2.1. Distribution Curve of the Data of Table 1.2. 
We then get a series of points through which, in general, 
This curve, as is evident from its definition, is 
: y = F(x), 
Le, the graph of the distribution function. It is sometimes called the graduation curve, 


or Galton's ogive (though 

it 1$ only shaped like an 8585 | L 

give in certain cases | 

Such as that of a uni- 

modal symmetrical | 

Curve). We shall use í | PUR s 
38 


2, along the perpendicular y-axis. 
& smooth curve may be drawn. 


8 


v expression ''distri- Ж Ni 
ution curve,” Б 6 
the Fig. 2.1 illustrates è j 
for Distribution curve m5 —- 
the J-shaped distri- 85 Frequehcy- 42925 
ution of Table 1.2, and E i 
wa 2.2 that for the у 
Í di ROI symmetrical S , 
Stribution of Table 1.7. È 
5 Fiegiebey-tudos 


ва теећапа curve has 
n drawn in both 


cases, 
can Curves of this kind Д 
Mine is used to deter- 
ў ась t P quantiles. In oS 58 60 62 64 66 68 70 72 75 76 
р We | о find the median, Height (inches) 
"b th Merely have to find Fic. 2.2. Distribution Curve of the Data of Table 1.7. 
^ S s. © аһыы. ichts shown to correspond to entries in the Table, e.g. cumulated 
in Issa eorr d- (Heigh 3 1 А | 
y to tl espon frequency at 64 inches is the frequency up to and including the range 
j 64— and therefore up to 644Ẹ inches.) 


lé ordinate N/2, 


ae MEASURES OF DISPERSION 


and so on. The positions of the quartiles and the median are shown in Fig. 2.2, and 


the reader may care to compare the values obtained by reading the graph by eye with 
those given in Example 2.5. 


Measures of Dispersion И 

2.14. We now proceed to consider the quantities which have been proposed to 
measure the dispersion of a distribution. They fall into three groups :— 

(a) Measures of the distance (in terms of the variate) between certain representative 
values, such as the range, the interdecile range or the interquartile range. 

(b) Measures compiled from the deviations of every member of the population from 
some central value, such as the mean deviation from the mean, the mean deviation from 
the median, and the standard deviation. 

(c) Measures compiled from the deviations of all 
themselves, such as the mean difference. 


In advanced theory the outstandingly important measure is the Standard deviation ; 
but they all require some mention. р 


the members of the population among 


Range and Interquantile Differences 


2.15. The range of a distribution is the difference of the greatest and least variate- 
values borne by its members. As a descriptive parameter of a population it has very little 
use. A knowledge of the whereabouts of the end values obviously tells little about the 
way the bulk of the distribution is condensed inside the range; and for distributions of 
infinite range it is obviously wholly inappropriate. 

More useful rough-and-ready measures may be obtained from the quantiles, and there 
are two such in general use. The interquartile range is the distance betw 
and lower quartiles, and is thus a range which contains one-half the to 
The interdecile range (or perhaps, more accurately the 1—9th interdecile range) 
between the first and the ninth decile. Both these measures evidently give 
mate idea of the “ spread ” of a distribution, and are easily calculable. 
they are fairly generally used in elementary descriptive statistics. In a 


they suffer from the disadvantage of being difficult to handle mathe 
theory of sampling. f 


een the upper 
tal frequency. 
is the distance 
some approxi- 
For this reason 
dvanced theory 
matically in the 


Mean Deviations 


2.16. The amount of scatter in à: population is evidentl 
by the totality of deviations from the mean. 
deviations taken with appropriate sign is zero 


y measured to some extent 
We have seen (2.5) that the sum of these 
- We may however write 


= | [2 — № | dF 


where the deviations are now taken absolutely, and define д, to be a coefficient of dispersion, 
We shall call it the mean deviation about the mean. 


Similarly for the median и. We may write 


-N (2:13) 


= | le-m . 


and call 6, the mean deviation about the median. 
In future the words “ mean deviation ” 
deviation about the mean, 


ee) өл 


alone will be taken to refer to the mean 


Qh 


STANDARD DEVIATION 39 


Both these measures have merits in elementary work, being fairly easily calculable. 
Once again, however, they are practically excluded from advanced work by their intracta- 
bility in the theory of sampling. ) 


Standard Deviation 
2.17. We have seen that the mean about an arbitrary point a is given by 


ду = [e — a)dF. 


We may, by analogy with the terminology of Statics, call this the first moment, and define 
the second moment by 
m =í (s — a)? dF o D О D . (2.15) 


The second moment about the mean is written without the prime, thus: 
2 


Has = (c—u?dF . STORE: a . (2.16) 
and is called the Variance. The positive square root of the variance is called the standard 
deviation, and usually denoted by с, so that we have 

"CRT CU. s (um 

The variance is thus the mean of the squares of deviations from the mean. The device 

of squaring and then taking the square root of the resultant sum in order to obtain the 
Standard deviation may appear a little artificial, but it makes the mathematies of the 
sampling theory very much simpler than is the case, for example, with the mean deviation. 
The calculation of the. variance and the standard deviation proceeds by an easy 
extension of the methods used for the mean. In particular, if 6 is some arbitrary value 


д» (about а) “| (к —a)?dF 


= i {а — b)? + 2(6 —a)(a — b) + (b — a)*aP 
= zi (about b) + 2(b — а) (about 0) + (b — a)? . (2.18) 


If now b is the mean we have А : x 
шә = ad (u A 
or ls = u — (ш — а)? . . . . (2.19) 


Thus the variance can easily be found from the second moment about an arbitrary point, 
Which can be selected to simplify the calculations. 


Trample 2.6 

To find the mean deviation and the standard deviation for the distribution of men 
according to height considered in Example 2.1 (Table 1.7). — (a 

In the case of the mean deviation for a grouped distribution, the sum of deviations 
Should first be calculated from the centre of the class-interval in which the mean lies and 
then reduced to the mean as origin. It so happens that in Table 2.1 the mean fell in the 
interval taken as origin, so that the preliminary arithmetic already exists in the Table. 

The sum of positive deviations is 8763 and that of negative deviations — 8584, 
Hence the sum of deviations regardless of sign is 17,347, the unit being the class-interval 


апа the origin the centre of the interval. 


40 MEASURES OF DISPERSION 


To reduce to the mean as origin, we note that if the number of observations below 
the mean is N, and the number above the mean is N,, and d = ш, — a, we have to add Nd 
to the sum of deviations about the centre of the interval and subtract Nd. 
d = 0-02 (Example 2.1), N, = 4918, N, = 3667. 
Hence the mean deviation 


In this case he 
Hence we add (4918 — 3667)0-02 = 25. ý 


17,347 + 25 А 
бу =e = = 9-02 inches. 
i 8585 ; 


For the standard deviation some further calculation is required, as shown in Table 2.2 


TABLE 2.2 


Calculation of the Standard Deviation for the Distribution of Table 1.7. 
(Some preliminary calculation already carried out in Table 2.1.) 


| (1) (2) (3) (4) 
Height, Frequency Deviation esf. 

inches. fe &. р” 
57- 2 —10 200 
58- 4 =) 324 
59— 14 = 8 896 
60— 41 = ? 2,009 
6l- 83 = 6 2,988 
62- 169 — 5 4,225 
63- 394 — 4 6,304 
64- 669 — 3 6,021 
65- 990 | = 2 3,960 
66- 1223 | = 1 1,993 
67- 1329 0 "T6 

i 68- 1230 1 
69— 1063 2 hs 
70- 646 | 3 5.814 
Tl- 392 4 6,272 
72- 202 5 5,050 > 
13- 79 6 2,844 a 
74— 32 7 1,568 
2 16 8 1,024 А, { 
6- 5 9 405 4 
| 11- 2 10 200 =» 
! 

TOTALS 8585 | -- 56,809 


Column (4) shows the sum X &f, where f i 


S the actual frequenc Г, : 
the second moment about the arbitrary origin i P ees Везни 


We have already found in Example 2.1 that 


, 179 


—a = — =Q 
n 8585 0-0209, 


SHEPPARD'S CORRECTIONS 41 


Hence, in virtue of (2.19) " 
. ji; = 6-6172 — (0-0209)? 
= 6-6168 

с = y/ ua = 2:57 inches. 

It may be noted that the mean deviation is about 80 per cent. of the standard deviation. 
This relationship often holds approximately for unimodal curves approaching symmetry. 
The reason will become apparent when we study the so-called “ normal " distribution in 
Chapter 5. 


Example 2.7 


To find the variance of the distribution 


1 
ар = = (1.— а)Р at de, 0<2<1. 
Bip. a) 
We have, about the origin, 
4 1 f An nuni 

Hae lesuada: 

"= gi, Do 
_ B(p.a + 2) _ a + 1g 


Bo)  @ra+erg 
We have already found (Example 2.2) that 
THEN 
ш EET 
Thus ps = ps — m? 
Те (q + 10 НАШЕ: 
P+tatDe+) (ta? 


a MED 
(pt+atVe+o? 


Sheppard’s Corrections 


2.18. The treatment of the values of a grouped frequency-distribution as if they 


Were concentrated at the mid-points of intervals is an approximation, and in certain 
Circumstances it is possible to make corrections for any distortion introduced thereby. 
These so-called “ Sheppard’s corrections ” will be discussed at length in the next chapter, 
but at this stage we may indicate without proof the appropriate correction for the second 
moment. 

If the distribution is continuous and has high order contact with the variate-axis 
at its extremities, i.e. if it “ tails off ? slowly, the crude second moment calculated from 
£rouped frequencies should be corrected by subtracting from it A2/12, where h is the width 
Of the interval. For example, in the height data of Example 2.6, we have ^ = 1, and the 
Corrected second moment is 
6.6168 — 0-0833 = 6-5335. 


The corrected value of c is /0:5335 = 2:56, as against an uncorrected value of 2-57. 


42 | MEASURES OF DISPERSION 


P ean Difference 


2.19. The coefficient of mean difference (not to be confused with mean deviation) 
is defined by 


а= FÉ аот) ary) 


= ih lz—gylfGe)fo)dedy .  .  .  . (2.20) 


In the discontinuous case two different formulae arise. We have either 


4, — yon pied |x; —2 | flap)f(m), je o.  . (2.21) 


—o ee 


the mean difference without Е ог 


AM Mbm-alfeMeh . „ .(@%® 
j=-2 К=—%0 
the mean difference with repetition. The difference lies only in the divisor and is 
unimportant if N is large. 

The mean difference is the average of the differences of all the possible pairs of variate- 
values, taken regardless of sign. In the coefficient with repetition each value is taken 
with itself, adding of course nothing to the sum of deviations, but resulting in the total 
number of pairs being N?. In the coefficient without repetition only different values are 
taken, so that the number of pairs is N(N — 1). Hence the divisors in (2.21) and (2.22). 

2.20. The mean difference, which is due to Gini (1912), has a certain theoretical 
attraction, being dependent on the spread of the variate-values among themselves and 
not on the deviations from some central value. It is, however, more difficult to compute 
than the standard deviation, and the appearance of the absolute values in the defining 
equations indicates, as for the mean deviation, the appearance of difficulties in the theory 


of sampling. It might be thcught that this inconvenience could be overcome by the 
definition of a coefficient 


B= ER (x — y)? F(z) dF(y). 


This, however, is nothing but twice the variance. 


For B= | im dF(x)dF(y) (x? — 2xy + y? 


—% 


| S arf dF(y) — af” x arp y dF (y) 


+| аә varo) 


= 2u, — 91° 


= 2. . > - а б . А е e (2.23) 


This interesting relation shows that the variance may in fact be defined as half the 
mean square of all possible variate differences, that is to say, without reference to deviations 
from a central value, the mean. 


БҮ 


CONCENTRATION | 43 


Coefficients of Variation : Standard Measure 

2.21. А The foregoing measures of dispersion have all Leen expressed in terms of units. 
E the variate. Tt is thus difficuls to compare dispersions in different populations unless 
the units happen to be identical; and this has led to a search for measures which shall 


be independent of the variate scale, that is to say, shall be pure numbers. 
mean deviation 
————————— or 


Several coefficients of this kind may be constructed, such as the 
mean 


mean deviation s 
сават 9 Only two have been used at all extensively in practice, Karl Pearson’s 
coefficient of variation, defined by 
с 

v= 1007 5 PAY TIEN T . (9.94) 

and Gini's coefficient of concentration, defined by 
E 
Spy 7 с - 

Doth these coefficients suffer from the disadvantage of being affected very much by ш 
the value of the mean measured from some arbitrary origin, and are hardly suitable for 


advanced work. 


095) 


2.22. For our purposes, comparability may be attained ina somewhat different way. 


Let us take с itself as a new unit and express the frequency function in terms ор але 
variable & related to x by 
Au 2.2 
pte Sal MESS 


Any distribution expressed in this way has zero mean and unit variance. It is then said 
_ to be expressed in standard measure. Two distributions in standard measure can be readily 


compared in regard to form, skewness, s, though not of course in regard 
to mean and variance. 


and other qualitie 


Concentration 
2.23. Gini’s coefficient of éoncentration arises in а natural way from the following 
approach :— 


Writing, as usual 


го) = | fads WE. t. 


let us define 
x 
P(x) = A хра) а= . 5 : a (2.28) 
Hid —» 

Just as F(x) varies from 0 to 1, Ф(ж) varies from 
the start of the frequency-distribution, 
lled the incomplete first moment. 
g a relationship between the 
The curve whose ordinate 


RN exists; of course, only if шу exists. 
to 1 provided that the origin is taken to the left of 
which we shall assume to be so. B(x) may be ca 
JW Now (2.27) and (2.28) may be regarded as definin; 
ariables F and Ф in terms of parametric equations in qus 
differ, The definition of curves by parametric equations will be found treated in most textbooks of 
ential calculus. The term “ parameter » in this connection 15 usual in mathematies, but is not 
s defined in 2.2. - 


j о 
4 be confused with the more special statistical parameter à 


44 MEASURES OF DISPERSION 


and abscissa are ® and F is called the curve of concentration. Such a curve is shown 
in Fig. 2.3. 


“Oy 


о 
о Е 1 


Fic. 2.3.—Curve of Concentration. 


The curve of concentration must be convex to the F-axis, for we have 4 L 
(dà аја) _ 
Мағ f) ; 
which is positive since our origin is taken to the left of the start of the distribution. 
а dx 1 E 
Hips qp = TO = positive. | 
Thus the tangent to the curve makes a positive acute angle with the F-axis, and the angle 
increases as F increases ; in other words, the curve is convex to the F-axis. 
The area between the concentration curve and the line F = @ is called the area · 


of concentration. We proceed to show that it is equal to one-half the coefficient of 
concentration. . | 


In fact, we have from Fig. 2.3 


1 1 
2 (area. of concentration) — ( Раф — ( Фак 4 
and thus si 9 5 
| 
| 


2u, (area) = m F(a) dF(z) — Ale D(x) dF (x) 
= ear@f ary) – f dref? yaro) 
=| | @—nare) ary. 
Now |“ m @ — у) dF(z) GF (y) — 0, and hente 
аш (area) | [алоја f l'e-aara ary) | 


TNT 


Il 


) 
=: 


"Where C, is the number of terms of type ( 


CONCENTRATION 45 


Thus 
Ы i l A; L. RS А 
area of concentration = BUSES 56 the coefficient of concentration. 
2 2m 2 


2.24. Various methods have been given for calculating the mean difference. The 
following is probably the simplest, particularly for distributions specified in equal group- 
intervals. 

Let us, without loss of generality, take an origin at the start of the distribution. We 


may then write 
N 


ie 
| x; — a, | = 22"(a; — 24) 
{= eat 


the summation X' being taken over values such that j >k. We have also 


a; — ж = (8j — аул) + a — js) He + +b Retr — %)- 


xc 
Thus Z'(v, — ty) = p» Cyri — Ya) 
pest 


wj — а) in 27 containing z,,, —%,. Since h is 
origin being at the start of the dis- 


the number of f j less than or equal to h (the 
oer of values of j less th q ооо UN АЙ 


tribution) and N — Б the number greater than or equal tok + 1, 
and thus h 


a Xal | 
= Уму — (аа = 58): + : к . (2.29) 
[esi 


= Ty 


This form is particularly useful if all the intervals are equal. P, being the distribution 


function of a, we then have 


N-1 
2 N — NF 
vmm aig, (TAI n) 


N-1 
NEU Coon 
«251-4» » 5 ( 
Tf the actual cumulated frequency for 2, is G, we have 
es] * 
2 à 2.31 
А, = СТ Gh) . po 


th " А 
© most convenient form in practice. 


Exam, 
ple 2.8 
ght distribution considered in previous examples, we 


Returning once more to the hei 
e Table overleaf. 


m 
ау Calculate ZGAN — G@,) as in th 


MEASURES OF DISPERSION 


TABLE 2.3 


Calculation of the Mean Difference for the Height Distribution of Table 1.7. 

| EE 

Š eeu _ | Frequency. Gr- | N — G, GN — С). | 

| | 

E 2 2 8583 17,166 
5s- j 4 6 8579 51,474 
59- 14 20 8565 171,300 
60— 4l 61 8524 519,964 
6l- 83 144 8441 1,215,504 
62- 169 313 8272 2,589,136 
63- 394 707 7878 5,569,746 
64— 669 1376 7209 9,919,584 
65— 990 2366 6219 14,714,154 
66— 1228 3589 4996 17,930,644 
67— 1329 4918 3667 18,034,306 
68- 1230 6148 2437 14,982,676 
69— 1063 7211 1374 9,907,914 
70- 646 7857 728 5,719,896 
Ti 392 8249 | 336 2,771,664 
72- 202 8451 134 1,132,434 
13- 79 8530 55 469,150 
T4- 32 8562 23 196,926 
75- 16 8578 7 60,046 
76- 5 8583 2 17,166 

77- 2 | 8585 — — | 
Torars 8585 105,990,850 

m 


We have, from (2.31), for the mean difference with repetition, 
A, — ? X 105,990,850 
8585? 
— 2.88 inches 


as against a mean deviation of 2-02 inches and a standard deviation of 2-57 inches (Example 
2.6). There is, of course, nothing inconsistent in the difference between these values. 


The coefficients are different in nature, and there is no reason why their numerical values 
in any particular case should approach equality. 


NOTES AND REFERENCES 


"The relationship between mean, median and mode expressed in equation (2.11) was 
discussed from the mathematical point of view by Doodson (1917), who showed that it 
holds as a first approximation for continuous distributions deviating only moderately from 
symmetry. 


It was shown by Dunham Jackson (1921) that the indeterminac 


ү y in the definition of 
the median can be removed by a more sophisticated mathematical a 


I pproach. He showed 
that for IY values x... zy the sum 


| $ — x; | ?, considered as a function of £, has 
7-1 


-— 


EXERCISES yc 


a minimum for some unique £, if p> 1; and further that as p — 1, $, tends to some 
unique value, which may be defined as the median. 

The proof of the increasing character of the function F(t) of 2.7 is due to Norris (1935), 
who gives references to earlier proofs. T 

The work of the Italian school on concentration does not appear to have been treated 
in English books. The fundamental memoir is that of Gini (1912), who has returned to 
the subject in subsequent papers, many of them in Metron. For methods of calculating 
the mean difference, see de Finetti and Paciello (1930). 3 


de Finetti, B., and Paciello, U. (1930), “ Sui metodi proposti per il calcolo della differenza 
media," Metron, 8, part 3, 89. 

Doodson, A. T. (1917), “ Relation of Mode, Median and Mean in Frequency Curves," 
Biometrika, 11, 425. 

Gini, C. (1912), “ Variabilità e Mutabilita,” Stud 
di Cagliari, Anno 3, part 2, p. 80. 

Gini, C., and Galvani, L. (1929), “ Di taluni estens 
qualitativi,” Metron, 8, parts 1-23, 3. 

Jackson, Dunham (1921), ** Note on the median of a set of numbers,’ 
Soc., 27, 160. i 

Norris, Nilan (1935), “ Inequalities among averages,” 


i Economico-Giuridici della R. Universita 
joni dei concetti di media ai caratteri 


> Bull. Amer. Math. 


> Ann. Math. Stats., 6, 27. 


EXERCISES 


2.1. Show that the mean deviation about an arbitrary point is least when that point 


is the median. 
2.2. Show that the mean (about the origin 


frequencies at 0, 1, 2, ... 7, +++ are 
m m? nm 
s aei ET . —) ее е 
dr Ic T Кл 


) of the discontinuous distribution whose 


is m, and that the variance is also m. 


2.3. Show that, if deviations are sm 
have approximately, for the Geometric anc 


G ТА 1 °з) 
= и — -= -7 
il 32 


all compared with the value of the mean, we 
1 Harmonie means, 


fy EN 
ded a Tm 
and hence that 
д — 24 +H = 0. 
2.4. Show that the mean deviation about the mean is not greater than the standard 
deviation, Е 
2.5. Show that for the “ rectangular ” population 
аР = dz, es Ос, 
p (about the origin) = $ 
Ша = 18 


- mean deviation — 


A, = 


е ее =| 


48 А - MEASURES ОЕ LOCATION AND DISPERSION 
2.6. Show that for the distribution | 
dF 4,6 сал, 0<гж< o 
the mean, standard deviation and mean difference are all equal to с; and that the inter- 


quartile range is c log, 3. 


2.7. Show that for the distribution 


dF = уе 7-1 dy, 0« y « 0 
д, (about the origin) = Var xx 5+) /r(5) 
Jy — v. 


2.8. Show that if a range of six times the standard deviation contains at least 
18 class-intervals, Sheppard's correction will make a difference of less than 0-5 per cent. 
in the uncorrected value of the standard deviation. 


2.9. Show that for a continuous distribution 
Ay = 2! F(x) {1 — F(x)) da. 


2.10. If the variate-values of a distribution are ж 


. . £y in ascending order of 
magnitude and 


8, = 258) U= 8, 
ј=1 r=] 
r N 
„= D>, ы V= 258. 
j=l r=] 
2 
then d= үз — U) 
2 
= yet + Lu, — 20}. 


^ 


Py 


CHAPTER 3 


MOMENTS AND CUMULANTS 
Definition of Moments 
3.1. In the previous chapter we defined the first moment (arithmetie mean) about 
an arbitrary point a by the Stieltjes integral Е 


к= [e car. mm PERS 


and the second moment about the point by 
РА =Í ОЛЫ И ОИ 


E 
In generalisation of these equations we may define a series of coefficients ш," = 1, 2 . . ., 
by the relation 

2 

n= f аР. «^ ОШО 3) 

m А 
1. is called the moment of order r about the point a. When a is the mean ш we write 
the moment without the prime, 


Р 
Е = | (a -- uy dF. А . . . . (3.4) 
In particular | 
Ші = 0, 
and we may also define a moment of zero order > 
&=®= [ dF = 1. 


ade to the rth moment of a particular distribution, 
at distribution. As will be seen later, some 
atistics do not possess moments of all 
and one or two do not 


It is assumed that when reference is m 
the appropriate integral (3.3) converges for th 
of the theoretical distributions encountered in st 
orders ; some, in fact, possess only a few moments of low order, 
Possess any, except of course the moment of order zero. 

3.2. Ifa and b are two variate-values, le& b — a — € and denote the moments about 

„© and b by y'(a) and (0) respectively. Then we have, by the binomial theorem, 
ау = (266—070 000 : 


-Seo 
na) = Ww (v — ay dF 

= Г Xe = oy cidF 
= Sof e orar 


_/ ()к-% б. a % ; 
- 7 
49 Е 


Hence 


4.8.— VOL. I. 


50 : MOMENTS AND CUMULANTS 


This equation gives the rth moment about a i 


b. It may be written in a symbolic form whic 
namely - 


n terms of the rth and lower moments about 
h will be found to provide a useful mnemonic, 


иа) = {x (b) + су 
with the convention that the expression on the right 
the form {y’(b)¥ replaced by u;(b). 


The equation (3.5) is of particular importance if one of the values a or b is the mean 
of the distribution. In this case we have 


is to be expanded binomially and 


Й 


к= 55 ()u- Tice e эы С 
7=0 
Ur 293072 (Ee. „е АКС) 


In particular 


Ha = us + us? Р 
Hy = ма + Зин» + дд? / | по зо э ЧОД) 
My = ра + Apps + билд, + pit 

and 


Ha = uy — p? o : 
Из = из — Зи те 2118 ; Р А а " . (3.9) 
Ha = ша — buius + 6u, H — Зил 

Calculation of Moments 


3.3. For a distribution specified numerically in a frequency table the calculation 
of moments of third and higher orders is akin to that of the first and second moments. 
For grouped data (high order moments are hardly ever required for ungrouped data) the 
observations are regarded as concentrated at the mid-points of intervals; a convenient 
arbitrary origin a is chosen, the moments about 4 calculated, and then if necessary the 
moments about the mean are ascertained from (3.6) or (3.7). The effect of grouping may 
be corrected for in certain cases. 

In practice numerical moments of order higher than the fourth are rarely required, 
being so sensitive to sampling fluctuations that values computed from moderate numbers 
of observations are subject to a large margin of error. 


There are two methods in general use for arriving at the moments about an arbitrary 
origin. The first is a; 


first two moments. 


The second will be considered in 3.10 in conne 
moments, 


ction with factorial 


Example 3.1 


To find the first four moments about the me 
marriages of Table 1.8. 

Until the last stage we work in units of 
mean is taken at 28-5 years. 


an of the distribution of Australian 
three years, the variate interval, A working 
To check the arithmetic we use an identity of type 


(v 1) = z? 4 8424 3z 4 1 
@ + 1)! = аи + 44% + 62? 4 da + 1, 


n immediate generalisation of the methods used in Chapter 2 for the’ 


CALCULATION OF MOMENTS an 


Thus, for instance, the value of g(x)(x + 1)’ is found in addition to that of g(z)a" and the 


V DE. two checked by identities such as 
Syle)(w + 1)? — Җа) + 3Ey(a)a® + 32g(2)o: + Эд), 
a g(x) being the actual frequencies. The arithmetic work is shown in Table 3.1. 


TABLE 3.1 
Calculation of the First Four Moments of the Distribution of Marriages of Table 1.8. 


= 
Mid- 
value 
of " 
Inter g. [A æg. | (2--1)9. 2°. (z4-1)g. xg. (z4- 1)?g. 2. (=+1)*. 
vals. | 
Years. 
f | | E 
L 165 294 |—4 |— 1,170|— 882 4,704 2,646 | — 18,816 | — 7,938 75,264 23,814 
19-5 | 10,995 | —3 | 32,985|—21,990| 98,955 | 43,980 | —296,865 | — 87,960 800,595 | — 173,920 
j m 225 | 61,001 | —2 |—122,002|—61,001| 244,004 | 61,001 —488,008 | — 61,001 976,016 61,001 
iw 25:5 | 73,054 |—1 |— 73,084|—83,873| 73,054) — | — 73,054 —156,899 73,054 = 
] | L—ÓÁ————|— | |. | 
385 | 56,501] 0 Еб = 56,501 | 876,743 | 50,501 = 56,501 
33,478 | 33,478 | 535,048 
31-5 | 33,47 56| 33478 | 133,312 | 33,478 | 267,824 47 n 
345 20500 5 e 82:976 185.121 | 164,552 | 555,363 | 329,104) 1,606,089 
375 | laos. | 3 61,707) 128,529 | 228,490 | 385,587 | 913,084) 1,150,761 | 3,055,036 
40-5 | 93320 | 4 51,1241 149.120 | 233,000 | 596,480 | 1,165,000 | 2,385,920, 8,880,000 
435 | 69361 5 46,007! 182,000 | 224496 | 779,500 | 1,346,976 | 3,807,200 . 8,081,856 
ao | 0286) 5 | 380) 33°390/ 171,120 | 233,780 | 1,080,320 | 1,636,710 6,181,920 | 11,452,770 
49-5 3.620 | 7 25:340| 28,900| 177,380 | 231,680 | 1,241,660 1,853,440 | 8,691,020 | 14,827,520 
i 525 | 2190| s | 17520) 19.710| 140,160 | 177,390 | 1,121,280 1,596,510 | 8,970,240 | 14,368,590 
555 | 2280) 8 | 1020) 16,550] 134,055 | 165,500 | 1,206,495 | 1.055000 10,858,455 | 16,550,000 
595 | 1100] 10 | 11000| 12,100] 110,000 | 133,100 | 1,100,000 1,464,100 | 11,000,000 | 16,105,100 
Rae "810 | 11 1,0001 19720] 98,010 | 116,640 | 1,078,110 | 1,399,080 11,859,210 | 16,796,160 
о $0 1 | a Sags] 8437| 93,456] 109,681 | 1,121,272 1425.853 | 13,457,664 | 18,536,089 
рб 487 | 13 Fasl) 6818] 82,303 | 95,452 | 1,069,039 1,336,328 | 13,909,207 | 18,708,592 
70°5 326 | 1 4'564 4.890 63.8906 | 73,350| 894,544 1,100,250 | 12,523,016 | 16,503,750 
73.5 211 12 3'165 3'3761 47,475 | 54,016 712,125 | 864,256 | 10,681,875 | 13,828,096 
16-5 119 | 16 31051 2093 30,464 | 342391) 487,424 584.047 | 7,798,784 | 9,938,999 
79-5 73 1,901) 19312] 21,097] 23,052 | 358,029 425,736 | 6,097,033 | 7,663,248 
225 2 л "486 '513| 8,748 блат | 157404) 185,193 | 2,834,352 | 3,518,667 
Л 85-5 va | tp S ИЗ) Sosa | 6000 90,020 112,000 MELSE 2,240,000 
п 88-5 5| 20 100 105 2,000 2,205 40,000 46,305 800,000 972,405 
ў e 
ore Sem E 2 838 2,035,287 13,075,105 (19,991,050 |137,306,102 202,091,751 
on tvel 301,785 | — | 318,040 474,90 2,155,898 2,035,287 ШЫ» | | 
| From this table we find i 
` Sleg) =" 88839 
| Z(x2g) = ` 2,155,838 
У(03) = 12,798,362 
| T Sig) = 137,306,162. t. 
D rorki : ividin, 
| Br values will be found to check and we have, about the working mean, on dividing by 
e 4 E 
Ko total frequency 301,785, Жын 
P y, —  0294,355,253 
| : ig = 7143,022,115 
} из = 42-408,873,807 


М 1 p, = 45+980,075,219. 


52 MOMENTS AND CUMULANTS 
For the moments about the mean, substitution in equations ( 
Из = 17-056,977 
Ha = 36-151,595 wi 
Фа = 408-738,210. 
-intervals, which are units of three years. 
multiply the rth moment by 3°, eg. 
Ша = 7:056,977 х 9 = 63-512,79, 


3.9) gives 


These are expressed in class 


To express the 
results in units of one year we 


3.4. If a distribution is specified mathematically the determination of moments is 
equivalent to the evaluation of certain sums or integrals. It is usually necessary to con- 
sider whether the moments exist. Some examples will illustrate the general principles 
involved. 

pe 3.2 


Consider the so-called 


binomial distribution (4 + py 
values 0, h, 2h. 


in which the frequencies of 
- . are the successive terms in the expansii 


on of the distribution, i.e. are д 


q", (тт. ӨЕ ++. Taking an origin at the first term and working in units yl 
of h, we have | 


‚ en DE 
№ = {( ‘a1 } 
which may be written 
д m 
ИЛ + р) 
= np(q + р)" 
= np. 


n 
, n А ^ 
ш = (Ө) 
2 K ms 

Pap) Ч +2) 
= "pd + р)" + n(n — 1)pXq + py 
= n*p? + npg. 

= "pq. 


i QA? 
= man n 
Из (nz) (q + p) 
etc., and it will be found that 


. Hence ТРА 
Similarly 


Hs = npq(q — р) 
Ms = 3pg*n* + pqn(1 — бру). 

Example 3.3 
Consider the distribution 


CQ aR 


| 
8 
A 
8 
A 
8 


) 


CALCULATION OF MOMENTS 53 


This is a unimodal distribution symmetrical about z = 0. АП existent moments of odd 
order about the origin therefore vanish. The constant Ё is given by the equation 


= ах 
=] прат 
_ Farin — $), 
Г(т) 
The moment about the mean of order 27, if it exists, is given by 
at 


o = k SS 
P i (1 zr 0°)" Ms í 


and this integral converges if and only if 
2m > 2r 4 1. 
Thus the moments about the origin of order < (2m — 1) exist and those of higher order 


do not. 
ко, . я е kedz . 
‘Lf = 1 it may be noted that the integral eras bes not completely convergent, 
-0 wg T 


: т’ da 
Le. lim f „еба does not exist, although the principal value 
N—> c, п —> o =n 20) 
lim |і LOCA 
es s [TES A 


atter of convention whether we regard the dis- 


does exist and is equal to zero. It is a т 
For m > 1 the mean exists and is located at 


tribution as possessing a mean in this case. 
the origin. 
- in the formula for uy, we find 


1 
Maki ituti = 
aking the substitution 2 = 1717; 
1 
me Д) (1 P zy-i gn-r-i dz 
0 


= prt Bro» m c » 
T(m) 
and оп substituting for k, 
*] 1 — — 
m noct if 2m 2 2r +1. 
Example 3.4 
Consider the “normal ” distribution 


dF = DP dx 


1 
oV 2n 
Ta is symmetrical about the origin. All moments exist, those of odd order vanishing. 

hus 

Lf” sees 

= м are 2e ах. 
аб оу (л)) -o 

This may be evaluated by partial integration, batia more dient method ig as follows: 
Consider the integral a 
ete 298 dæ 


1 r 
MO = GG) -e 


= gs 


54 MOMENTS AND CUMULANTS | 


We have, for all real values of ¢ | 


2 = 2 | 
ете 38 — J (^ ar oe), - = 


т=0 
The series on the tight is uniformly convergent in v and may be integrated term by term i 
if the resulting series is uniformly convergent. We then have 


M(t) = X (5). < } 


r=0 
; а U. А 
In other words, р, is the coefficient of д in € and hence 


Ha, = шо | 


27 1 | 
Moment-generating Functions and Characteristic Functions 
3.5. The previous example 


a function is accordingly called a Moment-generating Function. 
fully in the next chapter. 


For many frequency functions the integral | edF or the sum XZ (ei f(v))) may 
not exist for real values of t. This is, for example, true of the function аР = 11 + 2°)-" da; 
for finite positive values of m. A more serviceable auxiliary function is 


g(t) =| €" dF (t real) . 


This is known as the Characteristic Function and is of great theoretical importance, Tt 
will be seen in Chapter 4 that under certai 


tain genera] conditions the characteristic function 
determines and is completely determined by the distribution function. Tt also yields 
many valuable results in the theory of sampling. 


Since by the nature of the distribution function the integral 


Ze "э 00) 


| ЯР converges, 
| ol) | <f [ей |ағ < 1 


and hence the $ 
therefore be inte 
entiated provided that the resulting e 


<i re uniformly Convergent. We | 

have, for example, writing D, for < | 
| 

| 


Dig(t) = e оп, 
and hence, putting t = 0, x : 
и, = (= PID (t)],.. 
Tf p(t) be expanded in р 
in the expansion. 


| TG SM nh? aoi ы SESSEL Í 
provided that x, exists. Owers of t, u, must thus be equal to the Ё 


: йу 
coefficient of e» Thus the characteristic function is also a 


moment- 
generating function, 


MOMENT-GENERATING FUNCTIONS AND CHARACTERISTIC FUNCTIONS 68 


Example 3.5 
Consider again the binomial (q + р)". Taking h as unit, we have 


«Ое p" deu p e) 


7=0 
= (q +e". 
, ; d it | 
= (— — pe" y 
Hence ш = ( К + pe") £ 
] = пр 
е ;)2 а? П itn 
m= ofa ве) 
= np +a(n — 1)р?, 
and so on. 
Example 3.6 
Consider the distribution ` 
0, у> 


Го) 
which is known as Pearson’s Type III (ef. Chapter 6). The distribution may pate a ЕИ 
of shapes, depending on the value of у, but moments of all orders exist in virtue o: 


Convergence of the integral x'e% dz, the I-funetion integral. We have then, for the 


characteristic function, 
Bae eatin EE 
м grati gy-1 dy, 
p(t) T()Jo 


By the substitution z = «(a — it) this becomes is 
ат —: gY-1 dz 
EXE. DEUS vean d LC TA 
?0 = Toy (a — ity f. 
1 


T 
1-2) 
a 


Since | &7*2*-1 dz = Г(у) whether 2 is real or complex. 
0 


аз 
Hence g() =1+05 + PE al F 
and thus 
GE oou 
La: in 
«o yo» +1) " 
Ha = 2 


a 
, po + DG +2) 
ir раа Y 


А 


56 MOMENTS AND CUMULANTS 


and so on. In particular, 


F 
fa li 

2y 
rai 


Absolute Moments 
3.6. The quantity 


= | le—alrar j^ ur rbi un 


is called the absolute moment of order r about a. 
are written without primes. 

If r is even, the absolute moment is clearly equal to the ordin 
range of the distribution is positive the ab 
the start of the distribution are equal to tk 

There are some interesting inequalitie 
to the function E(t) of 2.7 and remember 
find, on putting £ = 1, 2, . . ., that 


1 
(e «oe... < (eo? Lt 
A more general inequality, due to Liapounoff, is 


балы с=ты ти ы о Ib >c>0. 


9s c ee MUTO 
A proof of this result is sketched in Exercise 3.14. -In particular, putting b = 3(a + c) 
we find 


The absolute moments about the mean 


ary moment, and if the 
solute moments about any point to the left of 


he ordinary moments of corresponding order. 
5 concerning the absolute moments. Referring 
ing that it is a non-decreasing function of t, we 


ПОС 719) 


, DE 
2 wan) < >, т 
3 


A 2 : 3 ‚ (3.15) 
Further, putting c = 0 in (3.14) we find ` 


wy «vb 
1 1 
' ea 

Ny s Va 


or 
which is equivalent to (3.13). 
Factorial Moments 
3.7. The factorial expression 
x(x -—h)(e — 2h)... (v — т — 1%) 


may conveniently be written a), a not 
x. Taking first differences wit! 


Az!) = (v + А) — gi 


(z + hele В)... (wr — 2h) —a(r—h)...( 
= ræ- 0h, : 


ation which brings out an analogy 


with the power 
t respect to x and with unit h, we have 


ll 


pps 


which may be compared with the differential equ 


ation 
da" = re-i dz, 
x 1 
Conversel ПЛ — cp Aye 
y digi Cro ааз. 
2=0 
т 
corresponding to is йл = „2 Ra 
rot} 


FACTORIAL MOMENTS 57 


The rth factorial moment about an arbitrary origin may then be defined by the equation 


Min = Уе И) 

j9—- 
where we have chosen the summation sign X rather than the Stieltjes integral because 
it is almost entirely for discontinuous distributions, or continuous distributions grouped 
in intervals of width A, that the factorial moments are used. In statistical theory they 
àre not very prominent, but in the theory of interpolation and of curve fitting they are 


sufficiently important to justify some mention of their properties. 
As usual, when it is necessary to distinguish between factorial moments about the 


mean and those about an arbitrary point we may write the former without the prime. 


3.8. The factorial moments obey laws of transformation similar to those of equation 
(3.5) governing ordinary moments. In fact we have the expansion * 


(a + b) = DG ae ыл 


7=0 
(v —a)! —(r—b4 с}! wherec =b — а 


- Xy — pye-n dà 


) 


VN c ; 
апа hence tinla) = (0) (фе). . . . « (3.17) 


j=0 


and hence 


which may be written symbolically 
ui, (n) = 00) + ©)”. 


3.9. By direct expansion of (3.16) it is seen that 
= Hn iz 
pp = Ka — Ма, E QU E 4. (9:18) 
Bm = Ез — За + 20а, р, 
Ma = ply — Shug + UNF He — бад 
and conversely that 
ee E 
Ha = Ща + Ptr) m 
4 Р ћи 
е = ща) + Shut + Hay , 
i = p + 6з) + Тһ?щә + Аи 
е equal the equations remain true when the primes are 


Since the first moments аг ae 
dropped and terms in first moments omitted. 


S IDEA IP s Е) 


i i ; therefore be equated 
i 1 of degree 7 in a, and may t q 
* It is clear that (a + b)” will be a polynomia 

| } ials in b and h but do not contain a. Putting a — 0 wo 
^ at kj ай—Л, where the k’s are polynomials | | 
with respect to a and putting а= 0 we obtain 


7=0 
Obtain bh} = р. Taking first differences o k's and the above result follows. 


тЫ” i : 
"UM S kpj. Successive differences give th 


. 58 MOMENTS AND CUMULANTS 


It is possible to give general formulae showing the factoria 


1 moments about one point 
in terms of the ordinary moments about another, and vice- 


versa, In fact 
А 


s ss = MC stones) x XM PESCE 


j =0 
Hie) = (увез + ендь) Eat ЛЫ шац 
7=0 
where B(x) is the Bernouilli polynomial of order n and 
n 


degree r in x, defined as the 
5 U, t 
Coefficient of — in 7 ef 

т! et— 1 


For a discussion of these polynomials and the derivation 


ference may be made to Frisch (1926), 


of equations (3.20) and (3.21) re 


Calculation of Factorial Moments 


ctorial moments for grouped data may be effected by a 
process of progressive summation which is illustrated in Table 3.2, 


TABLE 3.2 
" (1) (3) (3) (4) 
Frequency. First Summation. Second Summation. Third Summation. 
А Ait e +n 
js Gave naw ЛАЪ... (n- Ifa 
(1 —1)n—23) 
fs ARE ott +Һ Л+?Л+... (n —2)f, ЛҺ+ЗА+... RENE LU 
—2)n—$ 
in fefe sss +h fitin- лад... BBa), 
pa [^ ч Sa-2+2fn-1+3fn Su-2+3fn1+6f 
fn-1 Л-1+/Һ Fn-1+2fn Jn- +3fn 
n Sn Sn n 
ten О) 1, {9G — 1003 — 2 
1 
Writing the proportional frequencies in the Successive n intervals аз Ja 258 
п as 
shown in the left-hand column, we construct column 2 by 


i ELT in the 
TOW containing the sum 
n P ag o he 
In column 3 the process is repeated with the rows of column 2, Stopping a 
Tow, e.g. the nth row contains Л the (n — l)h row (f deat) ef, ed 
80 on, the second row containing the sum (0 —1 n --2]. j А soon pad 


M a (n E 2 foi T LE + 2f. 


№ 


CALCULATION OF FACTORIAL MOMENTS 59 


Column 4 repeats the process with the entries of column 3, but stopping at the third 
TOW; and so on. 
Consider now the sum of the entries-in column 2. In that sum f, appears once, 


n 
fs twice, .. . Ў, n times. Hence the sum is equal to Dif) 
j=1 
= May 
Та column 3, f, appears once, f, 3 times, . . . f, 1n(n — l) times. Hence 


n ë 


son = 50950) 


J= 
= bior 
In general, the sum of the (r + 1)th column will be given by 


L 
Sum = sir 


. If the actual frequencies are used instead of the proportional frequencies the sums have to 


be divided by the total frequency N. 

Thus the process of summation gives the factorial moments directly. It is a modifi- 
cation of one which is due to G. F. Hardy (cf. Elderton, 1938a). The use of the method 
in practice lies in the fact that for certain calculating machines the progressive summation 
is easier to carry out than the processes involved in the method of Example 3.1. 


Example 3.7 
Consider again the data of Table 1.7, showing the distribution of 8585 men according 
to height in inches. The columns on the right in Table 3.3 overleaf show the successive 
sums. At the top of each column there has been placed within brackets the number: 
which would have been obtained if the summation were continued up the column one 
place further than is required for the sum at the foot. These bracketed figures are useful 
to have as a check since each must equal the sum at the foot of the preceding column. 
From this table we find 
Шр = 11:020,850,320,33 
M= 117-055,096,097,84 
Щз = 1,194-957,483,983,69 
#4) = 11,702-727,082,119,98 


1 NT these values we may derive the ordinary moments, using equations (3.19), 
and find 


ш = 11:020,850,320,33 
из = 128:075,946,418,9 
Из = 1,557-143,622,597,5 
д = 19,702-878,509,027,3, 
from which we find, for the moments about the mean, 
шә = 6-616,805 
из = — 0-207,840 


шц = 137-689,185, 
the units being one inch, 


60 MOMENTS AND CUMULANTS 


TABLE 3.3 


Calculation of the Factorial Moments of a Distribution of Men according to H eight in Inches 
(Table 1.7). 


| Height. Frequency. First Sum. | Second Sum. | Third Sum. Fourth Sum, 

57- 2 8,585 (94,614) | == 

58- 4 8,583 86,029 (502,459) = 

59- 14 8,579 | 77,446 416,430 (1,709,785) 
60— 41 8.565 68,867 338,984 1,293,355 
6l- 83 8,524 60,302 270,117 954,371 
62- 169 8,441 51,778 209,815 684,254 
63- 394 8,272 43,337 158,037 474,439 
64— 669 7,878 35,065 114,700 316,402 
65— 990 7,209 27,187 19,635 201,702 
66— 1223 6,219 19,978 52,448 122,067 
67— 1329 4,996 13,759 32,470 69,619 
68- 1230 3,667 8,763 18,711 37,419 
69- 1063 2,437 5,096 9,948 18,438 
70— 646 1,374 2,659 4,852 8,490 
7l- 392 728 1,285 2,193 3,638 
72- 202 336 557 908 1,445 
73- 79 134 221 351 537 
T4- 32 55 87 13 186 
75- 16 23 32 43 56 
76- 5 ff 9 11 13 
TT- 2 2 2 2 2 
ToTALS 8585 94,614 502,459 1,709,785 4,186,163 

Cumulants 


3.11. The moments are a set of parameters of a distribution which are useful for 
measuring its properties and, in certain cireumstanees, for specifying it. Their use in these 
connections will be considered in later chapters. They are not, however; the only set of 
parameters for the purpose, or even the best set. Another series of parameters, the so- 
called cumulants, have properties which are more useful from the ‘theoretical standpoint. 

Formally, the cumulants ку, кз, . . . к, are defined by the identity in ¢ 


f at? ГА 
exp iet Sp + PT TE 


na 
т! 


А pest? 
gs abr Et OP obs x . (3.22) 


It is sometimes more convenient to write the same equation with it for t, thus; 


exp {ili + ae + oa ed 
„(it ‚(у 
=1+ шщ + ee + CERO E 
= | ШО 


= ot). ё Р г, а i 


~ F. 


RELATIONS BETWEEN MOMENTS AND CUMULANTS 6i 
(t 


Thus, whereas x. is the coefficient of E in g(t), the characteristic function, к, is the coefficient 
of US 1 if <pansion in power series exists 
“FL in log g(t), if an expansion in p xists. 


3.12. If in equation (3.23) the origin is changed from a to b, where as usualb — a — c, 
the effect оп g(t) is to multiply it by e-"^, for je dF becomes Гава dF. Hence the 


effect on log p(t) is merely to add the term — ite, and consequently the coefficients in log p(t) 
are unchanged, except the first, which is decreased by c. 

Hence the cumulants are invariant under change of origin, except the first. In this 
they stand in sharp contrast to the moments about an arbitrary point. 

Both cumulants and moments have another property of an invariantive kind, namely, 
that if the variate-values are multiplied by a constant а, jv, and к, are multiplied by a’. 
This is at once evident from their definitions. Thus any linear transformation of the kind 


E=lr 4m . š А ‘ : . (3.24) 


leaves the cumulants unchanged so far as the constant m is concerned and multiplies 
к, by l'. The sole exception is the first cumulant which is equal to the mean. In par- 
ticular, if we transform a distribution to standard measure, the only effect is to multiply 


к, by 07", о being the standard deviation and, as we shall see in a moment, being equal 
to iH. 


The invariantive properties of the cumulants was the origin of their original name 
Of semi-invariants, seminvariants or half-invariants (Thiele, 1889). It has, however, 
recently been shown that there are several other classes of parameters with the same 
property, and it seems best to reserve the word “ seminvariant " for any parameter 2, which, 
under the transformation (3.24), is multiplied by Г. The cumulants and the moments 
about the mean are thus particular cases of seminvariants. 


Relations between Moments and Cumulgnts 
3.13. Subject to conditions of existence, we have, from (3.22) 


n t 
A oy 


12 


= exp к jt gy ts hag = +} 


ку Kot? fet 
exp exp ( -5r ) + . . exp joe 
‚ж кү? f Kel? 
FRE ES 
Kyte 1 fute 
fit (SP) (7) +. 4... 


Picking out the terms in the exponential expansions which, when multiplied together, 
give a power of //, we have 


к УУУ 


m= 


ll 


[e 

+ 
Mo 
E 
012. 
=| 
ы” 

с^ 


Eu P т! 3.9 
. i Е . (3.25 
Hh Suma esu (3.25) 


62 MOMENTS AND CUMULANTS 


Ё : , sh that 
where the second summation extends over all non-negative values of the z’s such 


х Х (3.26) 
Pi, + Pmt... Prim =T: . : є É 


=t relations 
e rather tedious process of writing down the explicit МЕС 
y be shortened considerably. In fact, differentiating (3.22 


Tt is worth noting that th 
for particular values of r ma 
by к; we have 


Я A ; r3. 
Е M r, 
j! r! Ox; | | flOk; 
and hence, identifying powers of L 
ди, _ fr oe А „ (8.27) 
T- j'- + Е Р ә 
In particular A 
ди, , 3.28) 
А Ok, = TH, e 2 d E ч Е ( 
and thus, given апу и, in terms of the Ks we can write down successively those of lower 
orders by a differentiation, 
The first ten of these expressions are, for moments about an arbitrary point :— 
Ne. Se 
Hy = ky + кї, 
Hy = к: + 3i, + r, 


My = ky Na Зк +. brge? Bo 


Б = k; F бк + 10k iy + 10кукї + 15k], + 10x. 
Me = Kg + regie, + 15какь + lör? +. 10x 
Е löki + 453K? + 15къкі + ко, 
= кт + Тккү + 21к,к» + 21ккї + Збкук»` 
+ 85кук] + 702 + 105кк2 + 210, + 35, 
+ 105kjk, + 105к$к? + 21кәк + кї, 


ив = кұ Sze, + 2Brgicg + 28кек2 + 56i... + 168iskoiey + 56 ic id 
+ 35к2 + 280к;кзку -+ 210к,к? + 420& ieget + 10k 414. 
+ 280«2i, -+ 280к2к2 әк? 4- 56к»к? 
8 x 


5 : 
aki + кї, 

2 

3 + 60кзкәку + 20yr? 


F. 
| 


+ 108i pere, 


4 
"Ky 


1 + 840к»к$ку + 560x. ic 
+ 1085 + 42032 + 210k3ki + 28g? + r 


We, 
Я 4 agii | - « (3.29) 
Mo = ку + 9kge, + 36i, + Збкткї + 84кєк»5 + 252k, 


3 2520. ic + 84021 
+ 1260: 43 + 3780какзкт + 1260к.кзкї T 84к,к® + 945i, 
+ 12603? F 878к5к? + 36ra] 


Hio = к) + 10куку + &5кұкз + 45rgr? + 120, + 360 iie, 
F 120кук% + 210kge, + 840K giegicy -H 6302 + 1260? 

+ 210K} + 126x? + 1260кькуку -+ 2520к;кзкь + 2520. ie ct 
+ 8780: 51511 + 2520 cuc] -- 252.4 + 1575кїік, -+ 1575кїк? 
+ 2100к;к? + 12600к;кзкәку + 42005 uc ic + З150к,кў 

1... роо 3150: ue] + 210i, + 2800. 

+ 630022 + 1260021 + 2100кзкі + 12600к3к 

+ 12600, 5k] + 2520ккк® + 120kyk] + 9455 + 472542 
+ 8150к3кі + 630к2к® + 4бкәк® + Fee Н 


4 
М.а 


~~ ; 


RELATIONS BETWEEN MOMENTS AND CUMULANTS 63 


or, for moments about the mean (x; = 0), 
HET 

Из = кз, 

Ha = Ky + 3&2, 

Hs = кь + 10к,к», 


He = Kg + 15как + 10к2 + 15k3, 
Hs = к, + legea + З5какз + 105кзк?, а, ‚ (3.30) 
Ив = кұ + ?8кєк» + 5бккз + 35ki + 910к,к3 + 280x? Ks + 105K, Н 
Ho = Kg + Збк;к. + 84квк» + 120kgk4 + 378к;к5 + 1260кукук» + 280«3 

+ 1260x 3, 
Hao = ку + 45K gta + 120к:кз + 210K 6k, + 630кьк% + 126к? 

+ 2520кькк» + 1070k1Ks + 2100 43 + 3150k,«3 

+ 630053 + 94508. 

Conversely we have 
r E for 
$e I LIE SEE COEM 


Expanding the logarithm and picking out powers of 7 as before, we have 


a, Dm! Poe ode 
the second summation extending over all non-negative z's and p’s, subject to (3.26) and 
the further condition 
Sy, Toy "Г. Gly =P se s В ` . (3.32) 


The first ten formulae are, in terms of moments about an arbitrary point :— 


кі = My * 
к, = Шо — Ш, 
ка = My — Supr, 24, 


HQ = My — dusu — 3u;? + 12505? — бр, 
кь = u; — бин — 10изиз + 2003013 + 30u uy — 600518 + 2415, 
к, = и — бизид = 15 p14 {ty an 30u,u,? — lopa + 120psuspi; — 120u3u;° 
+ 30052 — 210и "Hy + 360 uu — 120445, 
к, =, — "usta — 91lu;us Ji 42u; uy — 35и из + 210p, psp; 
— 210p, uy? + 14045? pi, ЛБ 2105157 — 126055 usus? + 8400501“ 
— 630,5? u, + 252005? uj — 2520u;uj9 + 7201", 
ка = ug — Spip, — ; Bins + S6ugus? — 56изиз + 336 ps Hoey 
— 3364505 — 35m + 560 p04 ust + 4200413" = 2520u psp, 
T 1080/41“ + 560325 — 1680, щш? — 500 My 
+ 18440 uui" — 6720 usui" — 63045 + 10080,3 4? 
— 252000; + 2016045449 — 5040445, 
Ko = py — vugu, — Збитиз + "2i My” — 84исиз + 504и» 
An 504p us? — 126и; 2 "E 1008 ииз + 756и? — A536 piso” 
+ 3024p;u, + 630p," + 2520, alta - T5660 pista 
— 11340; m + 30240 uai" — 15120y 141° + 5604? 
— 15120; usu, + 20160? т? — 1560" F 90720 uuz” y 
= rau Aa + 60480 /15441" + 22080 щш = 151200 u3" u,’ 
+ 2721600521 — 181440 u41" + 4032041, 


Ls» (8888) 


64 MOMENTS AND CUMULANTS 


fio = Шо LO pop, — 45и» + Musu? — 12055 720 pt; uai 

— 720 u;u,? — 210u,u, + 1680631 + 1260, А 
— "I'860usu,u,? + 5040u;u,* — 126u;.* + 2320u;uu, E 
+ 5040usu us — 151200, p? — 22080, uu, + 60480; usu, 
— 302405; u,? + 31504, us = 9450.4" us? + 42001, 105" | (839) 
— "15600 uu. usu, + 100800 p24 из? — 18900 =e bos + (3. 
+ 226800052012 — 3180001, 0444 + 151200 — 16800 m, 
= 378001; 13" 3024004; uou,” — 25200002144 + 302400 p, uu, 

© = 1512000 u3 u3" p,’ + 1814400g n? — 60480015 ET 
+ 226804" — 56700043412 + 22680004321 — 31752001329 
+ 18144005555 — 362880110, , 


, for moments about the mean, 


or, 

Ka =», ] 
e 

Кз = Js, 


Kara 3u3, 

Ks = fs — 10дзд», ч А 

Ke = Шо — lõua — 103 + 3012, А 

Ky = ду — 2lusus — 35paus + 210413, ; ‚ (3.34) 

Кв = Ив — Buez — 56изиз — 85д{ + 420и + 56054. — 6302, 

Kə = [ty —36u;us — 84и; — 126u;u, + 756 uss + 2520 иза 
+ 560uj — 756015103, 

Kio = Шо Sopa, — 120,43 — 210ugu, + 1260/1 6403 — 12645 
+ 5040 us Halte + 3150yil + 42000,45 — 18900403 
— 37800153 + 2268043. 


Existence of Cumulants 


3.14. The formal expression (3. 
terms of the moments, and it is thus 
moments of orders r and lower exist. 


22) may be regarded as defining the cumul 
evident that the cumulant of order 7 exist: 
If, however, we look to the equation 


exp (z« 2) = $0) 


ants in 
s if the 


as defining the cumulants, it is not quite so easy to show that к, 

ws exist. It may, however, be shown that к, exists if у 

and this is sufficient for all ordinary purposes, 

because the variable £ in the characteristic funct 

complex quantity i. 
We have 


exists if u, and lower 
r the absolute moment, exists, 
Some care is necessary with the proof 
ion is real, but there also appears the 


$) = Е ейг qm, 


Expanding the exponential we have, if the moments up to џ, exist, 


$0 = Лйу св, 


j=0 


CALCULATION OF CUMULANTS 65 


where 
R -[ ar(e= = oe) 
a 7=0 ? 
= É (cos xt + i sin at — CaN, 
к i 2—0 7 


Considering the real and imaginary terms separately, we have, if r is even— 


o T „\?ў 0 z n at)2i-1 
B=" “С at — Y y e) +if ar c a- S- yt. 


7=0 Pal 
2 " 4 r41(at)r " r 
The real term in the integrand consists of (— 1)? E plus the remainder after = terms of 


" 
vty[ di 

the Maclaurin expansion of cos xt, which is equal to er —; 60s c where 0 «0 <1. 
7 Хаах me 

г 


| at 


The modulus of the whole integrand is thus not greater than 2 Similarly for the 
imaginary terms. Hence 
[2,1 < sf Га ар 


[ар 
A similar result follows if r is odd. Now if yw’, exists only as a principal value, it does 
not necessarily follow that у’, exists. But if the latter exists we have 


go = DE + ор). е P 
7=0 ў 


«3r, 


We may then, for some small £, take logarithms and expand, obtaining 


y. (it)! 
log 90 = Diese + ot) кож ы Pis (азу 
ј=0 Я 
the coefficients к; being the cumulants by definition. Hence if v, exists, к, and cumulants 
of lower orders exist. 


Calculation of Cumulants 


_ 3.15. The cumulants are not, like the moments, determinable directly by summatory 
9r integrative processes, and to find them it is necessary either to ascertain the moments 
and then apply equations (3.33), or to derive them from the characteristic function. For 
the latter case we have, from (3.35) 


s - C iy peso] _, ТУ 


The following examples will illustrate the processes involved. 
A.S.—VOL І. 


66 MOMENTS AND CUMULANTS 
Example 3.8 


In Example 3.7 we found the following values for the moments about the mean of 
the height data of Table 1.7 :— 4 


дү = 11:020,850 
из = 6:616,805 
из = — 0:207,840 
Hs = 137-689,185, 


whence, from (3.34) к, and x, have the same values as и, and u, and 


к, = fy — Sus? 
= 6-342,86. 


кү is the same as и, in this case measured from the centre of the interval 56- inches. 


The same results would, of course, have been obtained if we had used equations (3.33) 
and moments about the origin. 


Example 3.9 
Consider the discontinuous distribution whose frequencies at the values Orie stein ae 
j 
are "(1 Tp doo л 5-5). The characteristic function is given by 
P(t) —gm z еї! 
70 7 


= €" exp (me!) 
= exp m(e” — 1), 


Since for any r the absolute moment is the same as the ordinary moment, we have 


ei^, ign 
=m my" 
FN 

7=% 


and since this converges * cumulants of all orders exist. They are therefore given b 
the expansion of log $(t) as a power series in t. But Е 


и, = е 


log $(t) = т(ей — 1) 


=m Vy 
Se Л! 
j=1 
k = т 


Thus all cumulants of the distribution are equal to m, 


and hence 


for all r. 


* For the ratio of the (n "s l)th term of the Series to the nth is 


fid (y + ly /[m"nr 


em ING: m 
CEEE “pe (1 +) = Ө] 


«thus the series converges for ali finite values of m, 


and 


CALCULATION OF CUMULANTS 67 


Example 3.10 
In Example 3.4 we found, in effect, for the characteristic function of the normal 


distribution 
1 m 
куё @ 
dF түл) є dx 
1 ae mt 
=— | een 20 dx 
$07 а] 
P Lc 
=e 2 
log g(t) = — 22. 


It is easily seen that the absolute moments and hence cumulants of all orders exist. Thus 
- ur 


к, is the coefficient of —— in log 4(!), i.e. for the normal distribution all cumulants of order 


higher than the "m. are zero. The second cumulant is equal to o% 


Example 3.11 
In Example 3.6 it was found that for the distribution 


ЕЕ а> 0, у> 0 
а = тоу? E De ecu 


the characteristic function is given by 


$() = 


Tt is'readily verified that cumulants of all orders exist and hence 
к„ = coeff. of er in — y log ( = 2 


= y(r — 1)! а". 


Example 3.12 
Consider again the distribution of Example 3.3. 
k 
аР = da т > 1, — о $ Ў 
Tray sa Linh © 
The characteristic function is given by 
dx, 


ER еї 
ies 


which, since sin xt is an odd function, reduces to 


соз at 
VET ® 


This integral may be evaluated by complex integration round a contour consisting of the 


68 : MOMENTS AND CUMULANTS 


x-axis, the infinite semicircle above the z-axis and the infinitely small circle round the 
point z —i. It is found * 

Ex E 3 m= р m— 
BOS Tg pH [#4 4 mt уа руна 


7 (m+ а =n — 2) 


2 2)! 
2|t| 34. | 4 (m — 2) 
і АШТ aE S + аі 
Ifr-22m —] the absolute moment of order ғ 


[2 |" da 
= (1 + gem 
exists and hence so does the cumulant of order +, But in this case we cannot expand 


log ¢(t) in an infinite series of powers of t, though this might perhaps be thought possible 
from the form of $(). In fact, we can only expand log A(t) in powers of ¢ up to the point 
0. 


y» =k 


© 


at which the differential Coefficients of (t) exist, for { = 


To simplify the discussion, consider the case when m = 9. 


We have then, since 
k = 2/x in this case, 


$() — ecd ep c 


log 4() = — | 1| + log (1 +14} 
If t is positive this equals 


12 i 
— 5 + 3 — 
but if ¢ is negative it equals 
е в 
оа ча 


the two expressions differing in the sign of the term in t? and every second term thereafter, 

There is thus no unique expansion of log $(t) in Powers of ¢ about the point ¢ — 0. There 

are two forms of the function expressing log 4(t) according as f is positive or negative, 
However, these expressions coincide as far. as their terms in у апа /?, and the first 


and second differential coefficients of log 4(t) are uniquely defined when ¢ — 0. Thus 
the first and second cumulants exist and are given by 


ку =0 
Kg — 1, 
Cumulants of higher orders do not exist 


Corrections for Grouping 


3.16. When moments are calculated from a numeri 


cally Specified distribution w 
is grouped, there is present a certain amount of appro. 


ally ; 4 hich 
ximation Owing to the fact 


that 
* Results of this kind are given in several text-books of analysis, Sometimes incor 
is sometimes stated that 
? cos tx 
lus. 48 леі 

which is only true when ¢>0. The appearance of the modulus in the expression above i 
for the purposes of the example. A correct Proof is given in J - Edwar » Integral Oten, 

] 


rectly, ee. it 


crucial 


artizle 1326, Vol, 2, 


Ei 


CORRECTIONS FOR GROUPING s -69 


the frequencies are assumed to be concentrated at the mid-points of intervals. Itis possible 
to correct for this effect under certain conditions. 

Suppose the frequency function f(x) to be continuous. If the range is divided into 
intervals of width h, we are given, not the values of f(x) at all points but the frequencies 
in those intervals, e.g. the frequency in the jth interval, centred at v, will be 


h 
f= (P fle + E) dt. 


o> 


» 


We will denote the moments calculated from grouped frequencies—the “ raw 
moments—with a bar, so that we have i 


Р, = E? (fj) 


ј=—% 


E h * 
ay {orf лева. EC KOSTEN 


1 


j^ 


The true moment, if it exists, is given by 
R= | x'f(x) dx 


and it is required to investigate the relationship between the g's and the ws. 


Now we have, in virtue of the Euler-Maclaurin sum formula, for an arbitrary function 
к(а) which has derivatives of the mth order, 


fee ах = (к(а) + к(а +h) + к(а + 2h) 4- . . . + k(a + — 1h) + к(а + nh)) 


n hi-1i 4 anh 
-A в [<-> | ИЕТ 


j=2 


where S,, is a remainder term which may be expressed as 


nh" 


S, = — rome a + Onh) 0 = 0-< 1 
m even, 
and 2nk™ (1) (m) 
m. < | m! Ву 1(2)к (a an Onh) | > 0 <b <1 


m odd.* 


Suppose now that f(«) is of finite range, from a to b, derivable up to the mth order, 


* Cf. Milne Thomson, The Calculus of Finite Differences, section 7.5, for the general Euler-Maclaurin 
expansion. Tho form of S,, when m is even is given in section 7.5 of that book, and the above form 
When m is odd may be derived similarly. 

In our convention the Bernouilli number В, is defined as the coefficient of t/j! in (её — 1). The 


Bernouilli polynomial has already been defined in 3.9. Explicitly By = 1, B, = 1, B, =}, B, = В, 
= Ree 1 Ee. 1 = 5 691 7 
TEE Oy Bio gy Bye Bing Bio = gp Puy Bu = — | 


70° ; MOMENTS AND CUMULANTS 


and that at the end of the range f(x) and its first m derivatives vanish. Then f(x) and 
the first m derivatives are continuous throughout the range — co to + oo and the function 


һ 
ku)-arpmepsydb o. 2 t — . -. (839) J 


together with its first m + 1 derivatives, will also be continuous throughout that range. | 
If a is infinite (and similarly for b) it is assumed that 
lim a7f9(y) -^- 0 
z—-—95 ‘ 


for all values of j up to and including m, in which case x(x) and its first m + 1 derivatives 
will also tend to zero. Thus in either case the Euler-Maclaurin expansion (3.38) is valid 
for k(x) given by (3.39) and we may write 


[ew] =0 j<m4+l. 


Substituting in (3.38) we have, since к(— оо) = «(+ oo) = 0, 


h 
ee dx | Је +&)dé = o3! м i : f(x; + 8) 2 — Sia 
== ЕЯ 


ОС) 


A 

5 
h 
x 


Tu 


The integral on the left of this expression is equal to 


1f* 
iJ А а: 4) аах. я % 4 . (3.41) 


2 


provided that the multiple integral exists. Tf, in addition, it is absolutely convergent we 
may substitute a for « + é and integrate with respect to £. We shall then have 


> 1(? (F | : 
па zl | (© — Eyf) de dz m | 
«Ja | 


[` e + 5)" ta (s I уе e | 


т -+1 | 


5 
h 


г 
-2, O Grr ево, е у (dum | 


where И is the integral part of 7 


CORRECTIONS FOR GROUPING 71 


Thus if $,,, may be neglected, (3.42) gives the raw moments in terms of the actual 
moments. In practice we require the latter in terms of the former and it is easy to find 
from (3.42) the following expressions :— 


= Т ] 
из = ра — iz" 

= р — lys 
Ha — HS — xia 

lene th + : > . (3.43 
Ha = fa — sii? T 219^ f (3.43) 
, без T 
Hs = Bs — ga + 35M 
5 [ER 31 
(ES MESSIS е hs 

Hs = Bs — аа + yeah — ry uh | 


The general expression for these formulae is 


= MC je-i- ови) б. o GA 


770 
where B, is the Bernouilli number of order j. (Cf. W old, 1934a.) 


3.17. These are the corrections known as Sheppard's. It is important to realise 
the conditions under which they were obtained. 

(a) It is assumed that f(x) is bounded and tends monotonically to zero in the directions 
in which the range is infinite. . 

(b) It is assumed that the multiple integral (3.41) is absolutely convergent. This is 
equivalent to supposing that the absolute moment of order r exists. If f(x) is finite in 
range and bounded, the multiple integral is certainly absolutely convergent. If the range 
is not finite, since f(v) tends to zero monotonically in the direction or directions of infinite 


range, 
h 
1 if 2 
ÀJ oJ ^ 
2 


Will converge or diverge with 


ак + &) | dé dz 


е], | a | fw) dé de 


іе. with | le" Lee) ax 


Which is the absolute moment of order r. 
(c) It is assumed that f(x) and its first m derivatives vanish 
he range when the range is finite, or that 
lim 2’f (y) > 0 
for all j up to and including 7 when the range is infinite, 
(4) It is assumed that 8+1 is negligible. 


at the terminal points of 


72 MOMENTS AND CUMULANTS 


BB m , " 
© are less than 627 ™ magnitude * апа hence ; m+1 İS of order 


Now both By and 
шоо ОЙ 
m+ 
a multiplied by some value of f(x) in the Tange. Thus if h is small, the range is 


finite and f(x) is small; §,,., will be small and may be neglected, In particular, if 


Е UM РИУ: - (3.45) 


the Sheppard corrections will give the moments accurate to order т 


; Le. to the order 
of the terms applied in making the corrections, 


3.18. The foregoing discussion ig Tigorous, but the Corrections may be applied in 
practice with considerable confidence Whenever there is high-order terminal contact, 
Example 3.13 


Consider the distribution 


= 1 11 кек 5 
TP = 0)“ (1 — z)5 dy 0c, 


а case of the so-called Type I distribution, The exact frequencies for intervals of 0.1 
“may be obtained from the Tables of the Incomplete B-Function, ang are as follows :— 


Centre of Interval, 


Frequency, 
Ue MAT | х b Sieg Se. ge = 0-000,000;0 
Om a о ee LIS 2 A : * . 0,000,009,2 
WS Meo EN A N ; Е A : а А * 0:000,646,8 
Md MS РУР Е" а А > й s * 0-009,938,2 
O45  .- : : Ў : Н i " 5 0-061,137,4 
Ute TR Жа? Е : SAC Д * 0-192,199,5 
ODD : А x з ; ; А + 0332,887,7 
075 , : : " е Я 5 SN + 0-297,479,9 
WE. m Я y ; а И s { + 0-101,033,7 
Q5» uA a : RN : я `+ 0:004,667,5 
а 
Toran . . . *  1000,000,0 
The raw moments about 5 — 0 are shown in the following table :— 
| Moment. Raw. Exact, Corrected, 
i Я n 
n 0-666,662,8 0-666,666,7 0- 
| i 0-456,965,5 0:456,140,4 бал a 
| Hs 0-320,952,3 0-319,298 9 0-319, 85,7 
be i 0-230,335,1 0-228,070,2 0:228,053,9 
П и; 0-168,512,9 0:165,869,9 0:165 848,0 
i ГА 0-125,433,9 0-122,599,9 0-122,574/0 
з Ву 2-1-1 @ 1 
* For Bs, 1—0,j — 0 and aj = om PM Ge) 
n=] 
; : 4(—1)i-1 
(cf. Milne Thomson, loc, cit.) a 
(22))2j 


and further Bp = ( Te ) B 


2» 


CORRECTIONS FOR GROUPING 73 


The exact values of the moments are calculable by evaluating integrals of the type 
1 
[a — х) dz and are shown in the third column. Тһе final column shows the 
0 


results obtained by applying equations (3.43), e.g. 
Ha = F — h*/2 

= 0:456,965,5 — 0-000,833,3 

= 0-456,132,2. 
At the terminal ж = 1, f(x) and its derivatives up to the fourth vanish. At the other end, 
derivatives up to the tenth vanish. The function is bounded, of finite range, and the 
derivatives remain finite throughout the range. In virtue of (3.45) it is to be expected that 
corrected moments of third and lower orders will be accurate to the order of the terms in 
the corrections, i.e. x’, is accurate to order h?/12(0-001)and w’, to order A/4 (0-0001). Actually 
they are considerably more accurate than this. The corrected fourth moment is in error 
by a term of order 2 x 1075, and this is of the same magnitude as the correcting term 
zig! used in arriving at it. Similarly the corrected fifth moments are in error by a term 
of order 10-5, of the same order as опе of the correcting terms to the fifth moment, 


Йй ж : А 
els, and of the same order as or greater order than two correcting terms to the sixth 


ite 31 
А ne 
moment, 16 оћ^ and 1344 hs. 


Thus the corrected moments are in all cases a substantial improvement on the raw 
moments; but in applying the corrections it is necessary to guard against being misled 
about the accuracy of the final result by the apparent precision of some of the small 
corrective terms. 


Example 3.14 


As an illustration of the way in which Sheppard’s corrections break down when the 
condition for high-order contact is violated, an example is taken from a paper by Pairman 
and Pearson (1918). The following table shows the frequencies in a certain range of the 
normal distribution 


pp. gv 
dF = —e @dz 
Ул 
with intervals of width 0-5. 
Interval centred at — Frequency, 

LE › e e к % wv & v 1060591 
2-0 А . . . . . . А . . + 0-278,34 
2:5 . . . . . . з i E . +  0:092,45 
3-0 . . . . . . А * . . + 0-024,02 
35 é ‚ г E . si . . * . . 0-004,89 
4-0 ж . . . . . 3 . . . + 0:000,78 
4:5 е . . . . . е . А . + 0-000,10 
BO p P 2 ж * а ш x К - 0-000,01 
Toran 1:056,50 


The distribution has high-order contact at one end but not at the star 
being in fact -shaped and very abrupt at that point. 
The following table shows the raw moments about the mean up to the fourth order, 


t of the curve, 


74 MOMENTS AND CUMULANTS 


the moments with Sheppard’s corrections and the true mom 


ents calculated from the 
continuous normal distribution :— 


| | 
Moment. | Raw. | Exact. Corrected. 
fis 0-158,524 | 0-172,222 0-137,691 
A 0104220 | 0.098'619 0-104,226 
| p» 0-149,090 | 0-156,405 0-131,097 


It is clear that, at least for 
ions may fail completely, 


(3.43), to get the 
moments about the mean, 


Т grouping corrections 


Average Corrections 


3.21. There is a distinct type of problem which 
Suppose there is given a distributio 


of x, ў varying from — 90 to co by 
i+ 3h, v, varies from X, 


ii, = 2 bf 


integra] value 
— dh to Хуф dn, * As 


| NOS 
j—— E 


"иша 


AVERAGE CORRECTIONS К 75 
Denoting by E(j;) the average as a; varies from X; — 4h to X; + 4h, we have 


h 
Xp 


h 
E(g)— us 5E li е +0 2 dz, 


ms B. 
=} ay le a |e Ја; + E) dé da; 
[== °F 


ј=—% 


| 
vol 


h ù 
g ze Кена. —. "AULA NSHD 


which is the same as equation (3.40) with the omission of Sm+ı and the substitution of 
E(u, for д. Thus the Sheppard corrections apply for the average group-moments 
whatever the nature of the terminal contact. 

They cannot, however, be applied indiscriminately on that ground. In place of the 
conditions about terminal contact, which ensure the applicability of Sheppard’s corrections 
to any particular distribution, there is the condition that the grouping intervals are located 
at random on the range, which implies that although the corrections may be wrong in any 
given instance, the average effect in a large number of cases will be correct. In actual 
fact the condition about the random location of grouping does not operate very frequently 
for J- and U-shaped distributions, where the Sheppard corrections would not ordinarily 
apply ; for instance, in a distribution of incomes or deaths at given ages it is almost inevitable 
to begin the grouping at zero. 


3.22. Tt is also illegitimate to drop the dashes in order to obtain corrections for 
moments about the mean. If the mean of the grouped distribution is denoted by y, the ` 
average value of the rth moment about the mean is given by 


[3E 


BG) =i" ww", Де + E dede, 


№) _h 
z 


where y is a function of v and the transformation of the integral which has been used earlier 
in this chapter is no longer legitimate. Explicit expressions for average corrections to 
moments about the mean have not yet been obtained. From a consideration of some 
particular distributions, however, Kendall (1938) concluded that for all ordinary purposes 
it is sufficient to use equations (3.43) as if the mean were a fixed point. 


. 3.23. The Sheppard corrections have also been considered from a slightly different 
Point of view (Fisher, 1921). As the centres of the intervals move along the ‘variate axis 
the raw moments vary according to the different groupings which result ; and this variation 
13 evidently periodic of period h. We may thus write 


eL 
B= MU) ed 


f uy 


6 MOMENTS AND CUMULANTS 
7 


BS 


and may put this equal to Tm 


1 


do + A, sind + A, sin 29 + , 
+В, : соз Ü + B, cos 20 4, 


. 


Then, multiplying by sin 50 or cos sO and integrating from 0 to 2 


2: 
vir р E ie sin 80 ae| шш аз 


j2—- 


7, We have 


© 


mU 
1 2a MSIE 
В, = = 27 i cos s0 aol аса) da: 


Jaci 


and in particular 
I ud 
4-5 5 |" а. 2а) de. 


pu 
tts 


n. ror 
NC z 


= лә ef i dt 
2 


h 
Tlf F 
= efe +8) d£ dz 
> E 


which is the same as (3.41) and (3.46), and thus leads to the Sheppard Corrections 
For the periodic terms we have > 


2r? dmt te 
A, = iJ, sin dé [1 Cf) da: 


qh 

2(° TIS 9. 
ка dx т sjin 2796 
5 0 " эш: 


Ког воте mathematically Specified distributions We are able to Consider the 
magnitude 


r 


SHEPPARD’S CORRECTIONS TO FACTORIAL MOMENTS 77 


of these periodic terms. For instance, for the normal curve referred to the true mean 
we have, since 


ar 9л? 9zsx 
ar * Pein b gr. — 2 cos CL? cos лз 
h) н h E h 
р = Zase 1 
A, = (— yr oe ой 29а T 
= TS, h gx 
p Sese 
= tie л 
(ye 
iD, = 0, 2 


where ,A, and ,B, refer to the coefficients for the corrections to the mean. 
error of the mean is thus 


The grouping 
h( 292% _ Sota? 
= ЈЕ ^ еіп 0 — łe sint ... їс.) 
л 
: > : нер Р СЛИКА. 
For a grouping in which с = h (a very coarse grouping) this is, approximately, — ae 2 sin 0 
У 


and thus cannot be greater than oem, 


3.24. Average corrections may also be applied to discrete data which have been 


grouped in wider intervals but are different from those of the continuous case. Cf. Exercise 
3.13 and C. C. Craig (1936). 


Sheppard’s Corrections to Factorial Moments 


3.25. It has been shown by Wold (1934a) that for factorial moments the чыр 
corrections are as follows :— 


Hi = [De ] 
he 
Hey = Ёа — 12 
Я = A? 
Hig = His) = FR 1] +4 aT 


" E ч m ч 71 
Ши = Bu — > Be + Aaa — LM | e 
2n 80 © e (3.47) 
z 7 Бр, Б» il 31 
Ma = Pii — gita! + > gh — igo + z^ 


5 a D 2, 919. s 
Mig) = Pt — gE? + a — 3s he 


912 
4i8 


AD 2 дй» — 312948 


and in general are given by 


r 


, r j 3\;;-, 
fis > or (Sms P* V M NUES 


тер 


78 ` MOMENTS AND CUMULANTS 
where the Bernouilli polynomial BY+®(3) is equal to 


(= 1)*1 (3)! і +1 + + me j> Я! 
СЕО ат) 
апа BPE) =1, BPG) = 0. 


Sheppard’s Corrections for Cumulants 


3.26. As in section 3.16, and under the same conditions, we have, writing 0 for it, 


Й " T 
Efe] -M neon 
5 y 


j=-—@ 


h 


= if e~% dE li f(x) da: 


S". to e ee N GIA) 


The expression on the left gives the chara 
integral on the right the true characteri 
and noting * that 


cteristic function for the grouped 


data, and the 
stic function. 


Taking logarithms of both sides 


Oh 
9 В,(0ут 


5 Oh сч, rir 
2 


© 


r 


@ 
we have, for the coefficient in ap e He ВУ 


. . А ‚ (3.50) 
an attractively simple result for the She 
of odd order are zero except B, and the fi 
of odd order needs any correction. 


ppard corrections to cumulants. Since all B’s 
rst cumulant is equal to the mean, 


no cumulant 
For the others we have 


= Wee 
HIS E 
5 12 
к к, + Es 
SS Ere 9 d А è . (3.51) 
- hs 
кұ = E. 
252 
EM 0 V B,or 
* By definition guum = = 
1 Jo V B,97-1 
and hence 2-1 7$ 92223 "u^? 
т=? 


integrating from 0 to 0 we have the above result. 


^5 
N 


MULTIVARIATE MOMENTS AND CUMULANTS , 79 


Grouping Corrections when the Distribution is Abrupt . 

3.27. Various writers have considered the corrections to be applied when one or 
both terminals of the distribution do not obey the Sheppard conditions for terminal contact. 
References are given at the end of this chapter. 


Multivariate Moments and Cumulants 


3.28. The foregoing results in this chapter may be readily generalised to the multi- 
variate case. To save complicating the algebraic expressions, we shall deal with 
two variates v, and vz,; but the reader will have little difficulty in carrying out any 
generalisations for more variates. 

The bivariate moment и about an origin a, for v, and a, for x, is defined by 


Bos = E ie @ — a) (ts —a PF.  . . . (8.62) 


If one of 7, s is zero the moment becomes the ordinary univariate moment of the row or 
column-border distribution of the bivariate population. In the contrary case we meet 
anew type of moment—the product-moment. Thefirst product-moment ji, is of particular 
importance in the theory of correlation. The first product-moment about the variate 
means, шу, is known as the Covariance. 

As in the univariate case, bivariate moments about certain points can be expressed 
in terms of those about other points. If the a origin is transferred from a, to b, where 
€, = b, — а, and the v, origin from a, to b,, where c, = b, — а,, we have 


н.а ал) = (u^ + ex) (u^ + ca) . . > .. (3.53) 
where the product шш" on the right is to be replaced by 45,(D, bz). This corresponds to 
the symbolic equation р 
wla) = (wd) + ey 
for the univariate case. 

Methods of caleulating the product-moments for numerically specified distributions 
will be considered in Chapter 14. The determination of biyariate moments for a mathe- 
matically specified population is a matter of evaluating double sums or double integrals, 
and no new statistical points call for comment. 


Example 3.15 
The bivariate distribution 


jie ü exp | а 1 ( _ piv. L2 


k 2uo,o,(1 — p?) 1 — р?)\0ї 0105 о; 


ләдә 


)\, dv, — о < m, 2, < со. 
Let us evaluate :— 
M(t, 1.) = ја Г eh tats qp, 
Making the substitution, р 


9 
& = 2, — о, — обой, 
p 2 
N = Va — POO, — 0513 


80 MOMENTS AND CUMULANTS 


we find 
JM(t ts) = exp (eof + 2,1,,0,0 + бо?) x — 


1 ui: 1 Stn oq? 
э —: OE үа кычы mE qs 
220,6.(1 — mts |6 | (i = ales 9102 + ze) as H 
= exp ((fiej + 2poretyte + #302) }. 


T 45 А ? 2 
Now д is the coefficient of Ss in M(t, t) and thus we find, for instance, 


I 


Ho = о; 


Heo = Gi, Маз = p0105, 

Изо = Hai = Ду» = Hos =0 * 

Ha) = 301, изу = 3роїсз, изз = (1 + 2р2)о?03, 
H 


Hais = 3p0,03, Hos = 305. 


3.29. The bivariate analogue of equation (3.22) may be written 


көйү Korta Krs 
I RE. +. . ug +. Sd" 


а, fih Л Bats espe 
tw att ARS. S Җ ` BSA) 


"t symbolically, 


1 
exp [23 T). = ^us + 2.) (3.5 
! * — « (8.55 


where Kh +t)? = ру p t р кр, Py, 


In terms of characteristic functions we may define 


P(t ta) =f / di. MN E (3.57 
and, as before, write е d" 


TET 
(tr; ta) P Mes T 


= е V (9, it y 
ew ў ш: = : A Е . (3.58 


r,s=1 
subject to conditions of existence, 
From these equations the bivariate moments can be e 


3.30. Wold (1934) has given the following e i 
bivariate moments and cumuiants, the variates being оте ӨЛ i 
intervals hy, hy. 


Я r 8 J T s 
Hs = Mn ya = 1)(21~k _ 27 
PA EA yv. ( 8,8. E ker cu - (3.59) 


j-0 k=0 


MEASURES OF SKEWNESS 81 


In particular 


"P 24 , m , a? 2. 1 
AA Lo = Bn — Beh, иц Ee Mon = Rs — 3808; 


Hao = Hag — ont Шә = B iit, and two symmetrical equations 


] Е LH 7 ; ы. evi Li T 5 + . (3.00) 
Hag = у — B3 T a Ha = 31 — Buys and two symmetrical equations 
д> ahe a - 
H22 = рог — зоб — Meng t Tay ) 
For cumulants we have 
Кы = Kig 7,82 0 
“Шз; И 
—— r>2 
VM MEO" vic! Qu он COGI 
2 В, 
Kos 05 T L s> 1 


Measures of Skewness 


3.31. We have considered measures of location and dispersion in Chapter 2. With 


the aid of the moments we can now proceed to consider measures of other qualities of the 
Population, and in particular its departure from. symmetry. 


In a symmetrical population, mean, median and mode coincide. Tt is thus natural 


to take the deviation mean to mode or mean to median as measuring the skewness of the 
distribution. K. Pearson proposed the measure 


SU Mean = mode 


which is subject to the inconvenience of determining the mode. For a wide class of fre- 
quency-distributions known as Pearson’s (cf. Chapter 6), this measure may, however, be 
expressed exactly in terms of the first four moments. We define 


ш я ©, ЖЫ 29 


кл . . . . . + (3.63) 
Then it may be shown that for Pearson curves 


i. MB, +8) 
Rim тра = & e ow ову 


and this equation may be taken as defining a measure of skewness applicable to all 
distributions whose moments up to and including the fourth exist. 
The coefficient A, itself is also a measure of skewness. Clearly if the distribution is 


Symmetrical it vanishes since Из vanishes, and the size of lati і і 

sym 5 Ms relative to x? (or 

Indicate the extent of the departure from symmetry. ` бы. 
4.8. —VOL. I. 


G 


82 MOMENTS AND CUMULANTS 
Generally we may define 


: Ms flan [| 
Bossi = аы) <: 4 
1% pac МОГ 

flanta 
fa, = > 
He 


quantities which are not in general use but will be found to occur occasionally in statistical 
literature. 


More convenient quantities than B; and f, for certain purposes are 
n--—, > 7 А > . (8.66) 


ааа И ‚=. с) 


If the distribution is expressed in standard measure, у, and y, are its third and fourth 
cumulants. 


Kurtosis 
3.32. In the so-called * normal" distribution 


12 
dF = rae ae, СО es Фе; ОО 
oV 2л 


B. attains the value 3 and y, is zero. Curves for which Ye = 0 are called Mesokurtic, 
_ Those for which y; > 0 are called Leptokurtie and will, relative to the normal curve, be 
sharply peaked. Those for which у, < 0 are called Platykurtie and will be flat-topped : 


Example 3.16 


For the distribution of Australian marriages considered in Example 3.1 we found 
for the raw moments about the mean in units of three years, ш» 


йз = 7-056,977, р, = 36-151,595, р, = 408-738,210, 
With Sheppard’s corrections these become 
Ha = 6-973,644, и» = 36-151,595, p, = 405-238 888, 
From these values we find 
В. = 3:854, у, = 1-963 
А В» = 8333, у, = 5-333 
indicating considerable skewness and leptokurtosis. 


Example 3.17 


- From the formulae for the moments of the binomial distribution " n 
3.2 we find considered in Example 


_4—р2 
мрд 
1—6 
а рур 
пру 
во that, asn — co, у, and y, — 0. Thisis in accord 
that the binomial tends to the normal form аз », 


yi 


tends to infinity, 


MOMENTS AS CHARACTERISTICS OF A DISTRIBUTION 83 | 


Moments as Characteristics of a Distribution 


3.33. The use of moments and cumulants in determining the nature of a frequency- 
distribution will be abundantly illustrated in later chapters, but some general remarks may 
be made at this stage. 

It has been noted that the characteristic function determines the moments when 
they exist, and it will be proved in Chapter 4 that the characteristic function also deter- 
mines the distribution function. It does not, however, follow that the moments completely 
determine the distribution, even when moments of all orders exist. Only under certain 
conditions will a set of moments determine a distribution uniquely, but, fortunately for 
statisticians, those conditions are obeyed by all the distributions arising in statistical practice. 
For all ordinary purposes, therefore, a knowledge of the moments, when they all exist, is 
equivalent to a knowledge of the distribution function : equivalent, that is, in the sense that 
it should be possible theoretically to exhibit all the properties of the distribution in terms 
of the moments. 


3.34. In partieular we expect that if two distributions have a certain number of 
moments in common they will bear some resemblance to each other. If, say, moments 
up to those of order л are identical we know that as n tends to infinity the distributions 
approach identity, and consequently we expect that by identifying the lower moments 
of two distributions we bring them to approximate equality. Some mathematical support 
for this so-called Principle of Moments may be derived from the following approach : 

It is known that a function which is continuous in a finite range a to b can be repre- 


E) 
sented in that range by a uniformly convergent series of polynomials in 2, say У P,,(x) 


n=0 
where Р, (х) is of degree n. Suppose we wish to represent such a function approximately 


by the finite series of powers аа, The coefficients а, may be determined by the 


n=0 


principle of least squares, ie. so as to make 


b 
Ju хал) "M ux EEG 


а minimum. Differentiating by a; we have 


b 
- 2 (f — Ха,а")аў dx = 0 


b о 
cr fje de = ш - | ИТ 
а а 
If now two distributions have moments up to order x equal they must have the same 
least-squares approximation, for the coefficients a, are determined by the moments in 
virtue of (3.6 ). Furthermore, if in the range the distribution Л differs from Ха a” by 
ё and f, by e, then f, differs from f» by not more than e, + £s. à 
.  À similar line of approach may be adopted when the range is infinite, the distributions 
in such cases being, under certain general conditions, capable of representation by a series 
of terms such as € *"P,(v). (Cf. Chapter 6.) The same conclusion is reached. 
~ E а равона which have a finite number of the lower moments in common will, 
Er A e approximations one to another. We shall encounter many cases where 
ough we cannot determine a distribution function explicitly, we may ascertain its 


84 MOMENTS AND CUMULANTS 


moments at least up to some order; and hence we shall be able to approximate to the 
distribution by finding another distribution of known form which has the same lower 
moments. In practice, approximations of this kind often turn out to be remarkably good, 
even when only the first three or four moments are equated. 


Mean Values 


3.35. To conclude this chapter we may note that the moments are particular cases 
of a general class of functions known as Mean Values. If we have a function y(x) defined 
in the range of a distribution, then 


i РАР e ~~ & „ « « (S) 


if it exists, is called the mean value of y(x) for that distribution ; it is sometimes written 
as Е (р(х) }, a notation we shall often find useful. The moment of order r is thus the mean 
value of 2” and the characteristic function is the mean value of e", The letter E in this 
connection is the first of the word “ expectation,” and mean values as we have defined 
them are sometimes known as “ expected ” values, particularly in the theory of probability. 


The objection to this practice is that only rarely is it to be expected that we shall meet with 
the “expected” value in sampling. 


3.36. Two important properties of mean values are to be noted. In the first place, 
if we have two functions y,(x) and y,(z), 


[рав + [oar = [о + pyar 
and thus 
Ey, 3m y») === Е(р) Eis E(y;), . 
i.e. the mean value of a sum is the sum of the mean values, 


Secondly, if we have two independent variates v, and 2а 
Fi, F,; and if y, is a function of v, and уз of x, then 


ffov: аР, dF, = fv dF, [2 dF, 


distributed with functions 


‚ог, 
Epp) = E(yi)E(y;) . 

so that the mean value of the product is the product of the me. 

only true if the variates are independent, whereas (3.71) 


= e ^V (5,09) 
This is in generel 
Such restriction, 


an values, 
18 subject to no 


NOTES AND REFERENCES 


In most of the literature what have here been called “ 
semi-invariants or seminvariants. ар were introduced b 
failed to draw a clear distinction between the а: i 
of those parameters from a sample, with the x Ho cB po and езшш оа 
beryeen cpu. parametar and semi-invariant statistios "(This is in a way a 
be interpreted as a criticism of Thiele, who could hardly have eee expec Toc We 
years ahead of his time.) Some recent work by Dressel (1940) has shown the desirability 
of reserving the name " seminvariant ” for the more general class of parameters which 


cumulants " are referred to as 
у Thiele (1889), who, however, 


ПГУ 7 


NOTES AND REFERENCES 85 


are, except for powers of кз, invariant under transformations of the origin. Dressel points 
out the analogy between such parameters and the functions of the coefficients of the binary 
form 


а" + (Dear v Г 55556 Ө Ж Tee GY", 


which are invariant under transformations of type 
é =le tm, у= 1. 


The word “ seminvariant ” has been in use for many years in the theory of algebraic 
invariants to denote such functions. The word “ cumulant ” is due to Fisher and Wishart. 

A comprehensive account of the mathematical relations between moments, factorial 
moments and cumulants is given by Frisch (1926). 

There is an extensive literature on corrections for grouping. Kendall (1938) gave 
a bibliography which appears to be complete except for the omission of a paper by Fisher 
(1921) and one by Elderton (1938b). For corrections in the case when the Sheppard 
conditions are violated, see Pairman and Pearson (1918), Sandon (1924), Martin (1934), 
Pearse (1928) and Elderton (1938a). For Sheppard’s corrections for a discrete variable 
(which appear to be due to H. C. Carver) see Craig (1936); and for the corrections in the 
multivariate case see Wold (1934b). 

References to the problem of moments (i.e. the conditions under which a set of constants 
can form the moments of a distribution) are given at the end of Chapter 4. As to the 
mathematical basis of the principle of moments, see Merzrath (1933) and Romanovsky (1936). 


Craig, C. C. (1936), “ Sheppard’s corrections for a discrete variable,” Ann. Math. Statist. 
7, бб. 

Dressel, P. L. (1940), “Statistical seminvariants and their estimates with particular 
emphasis on their relation to algebraic invariants,” Ann. Math. Statist., 11, 33. 

Elderton, Sir W. P. ( 1938a), Frequency Curves and Correlation, Cambridge University Press. 

—— (19380), “ Correzioni dei momenti quando la curva é simmetrica," Giorn. dell’ Ist. 
Ital. Att., 16, 145. 

Fisher, R. A. (1921), * On the mathematical foundations of theoretical statistics," Phil, 
Trans., A, 222, 309. Б 

Frisch, Б. (1926), “ Sur les semi-invariants et moments employés dans l'étude des distri- 
butions statistiques," Oslo; Skrifter af det Norske Videnskaps-Akademie, II, 
Hist.-Filos. Klasse, No. 3. 

Kendall, M. С. (1938), “ Тһе conditions under which Sheppard" 
J. Hoy. Statist. Soc., 101, 592. 

—— (1941), “ The derivation of multivariate sampling formulae from univariate formulae 
by symbolic operation,” Ann. Eugen. Lond., 10, 392. 

Martin, Е. $, (1934), “ On the correction for moment coefficients of frequency distributions 


when the start of the frequency is one of the characteristics to be determined,” 
Biometrika, 26, 12. 


Merzrath, E, (1933),. “ Anpassung von Flach 

" und ihre Auswertung für die Kor 
Pairman, E., and Pearson, K. (1918), 

limited range frequency-distribr 

and any slopes at the terminal 


s corrections are valid,” 


en an zweidimensionale Kollektivgegenstiinde 
relationstheorie," Metron, 11, No. 2, 108. 
“On corrections for the moment-coefficients of 
ations when there are finite or infinite ordinates 
s of the range,” Biometrika, 12, 231, 


86. - MOMENTS AND CUMULANTS 


Pearse, С. E. (1928), “ On corrections for the moment-coefficients of frequency-distributions 
when there are infinite ordinates at one or both terminals of the range," Biometrika, 
20A, 314. 

Romanoysky, V. (1936), “ Note on the method of mcments," Biometrika, 28, 188. 

Sandon, Е. (1924), “ Note on the simplification of the calculation of the abruptness coefficients 
to correct crude moments,” Biometrika, 16, 193. 

Shohat, J. (1929), “ Inequalities for moments of frequency functions and for various 
statistical constants," Biometrika, 21, 361. 

Thiele, T. N. (1889), Theory of Observations (reprinted in English in Ann. Math. Statist., 
1931, 2, 165). 

Wold, H (19344). “Sulla correzione di Sheppard,” Giorn. del? Ist. Ital. Att., 4, 304. 

(19345), “ Sheppard's correction formulae in several variables," Skand. Aktuar., 


17, 248. 
EXERCISES 
3.1. Show that the rth moment about the origin of the distribution 
аР = kz veva da 0<®<‹со, y>0 
; yu Ст = 1) 
E = „= Г(р = ly ° 


ire p=, and does not exist in the contrary case, 


3.2. In the distribution 


i- Т g2N-m d 
DELE) e" ad; — OKA о 


show that, about the origin, 


: 
" > ар, A 
ik = har cos 2-7-2 0 inr 0 g-9? 40 
> 


and hence that : 


f a 
y o. N laa Dzi}. 


3.3. Show that the discontinuous distribution wh i i 
mE ire. VP pie ose frequencies corresponding to 


em] Z т? 
AP epe y 


has, for the moments about the mean, 


Ha =M, Ша = т, fy = т(1 + 3m), Hs = т(1 + 10m), и, = т(1 + 25m + 15m?). 


3.4. 


Show that for the distribution whose frequencies for variate-values ОИ 75 


ате Ње successive t n " n 3 
erms in ($ + 3)", ie, (4) Е ; (1) (5) ME 1 all cumulants of odd 
order except the first vanish, j 


EXERCISES 87 


3.5. Show generally that the cumulants of odd order vanish for any symmetrical 
distribution, except the first. 


3.6. Show that e may be expanded in an infinite series, valid in — oo < v < co, 


Р gel giri 
1 + (e#—1) at 4 (e'—1)? = Ee Egit ay = Tel 


the factorials being taken with unit interval; and hence that 


Hir) = [@"ф(Ф)]=о 


d 
where d = ae" 


Hence show that, for the binomial (q + p)" about the origin, 


fy = ap. 


3.7. Show that the distribution whose frequency at the variate-value + 2r (r integral) is 


-3a] € E а?г+? а?+4+ 
$ le потр 330v 4330 ^ + 


and at + (27 + 1) is 


№ en qna 


arts 
| {ек Ей + л + айка tes} 
| | *has odd-order cumulants equal to zero and even-order cumulants equal to 2a, 
| 3.f Show that for the distribution 
Ts 
i ар = =е dx, 0 «€ x « o 


к, = g'(r — 1)! 
3.9. Show that 
. “үм JN 
Е РЕ 2. $ 1) 


and hence that 


k, = (— 1y-i | us 


88 MOMENTS AND CUMULANTS 


3.10. Show that for the distribution 
dF = dx, [E 


grouped into an integral number of intervals of e 
and fourth moments about the mean are 


qual width A, the corrections to the second % 


80 
(Cf. Elderton, 19385. Note that the first is exactly, and the second approximately, the 
Sheppard correction with sign reversed.) 


3.11. If 2, stands for the operator such that 


ди, == LU ү» T2p 


= r<p 
and @, is distributive when applied to products, e.g. 
(AB) = B(2,4) + A(2,B), 


y 
show that д, annihilates every cumulant (considered as a function of the moments) except ^ 


кр, and that 


05k, = р! | 
(Cf. Kendall, 1941.) 


3.12. If f(x) is an odd function of x of period 4, show that 


| 2707—19 Flog х) dx = 0 
0 


for all integral values of ғ. Hence show that the distributions 
dF = z-"**0 — Asin (4x log x) dx 0<хж< © 
0<A<1 


have the same moments whatever the value of 2. (Stieltjes. See refs. to Chapter 4.) 


3.13. Show that if the frequencies of a discontinuous distribution are distributed 


at equal intervals m ™ in. each grouping interval л, the average grouping corrections to “ә 


em BE Lr) 


т m* 


the cumulants are given by 


(Cf. Craig, 1936.) 


3.14. Liapounoff's inequality for moments. Beginning with the inequality 
(Zab)? < (2a?)(Xb2) 


show that for positive values a, . . . vy 


(z^ Y < e) (ze), 


EXERCISES ? 89 


Hence that 

(24 @ Py < (Xx*)(Xw- . . . (аә) 
is true when p is of form 2", Hence show that it is true for any integral p by noting that 
if 2" is the smallest power of 2 greater than p we may take 


Gad o. + tp 
941 — | 0+9 = . Xam = p 
Hence, putting p =& — с, =... = ар —06 аьр... ас = а, show that 
"ox ya: byte, 


(The inequality remains true for a continuous variate, as may be seen by considering limiting 
processes.) 


3.15. Show that for the bivariate distribution 


2 От, а 
dF = g exp п fi = RNG їн 3 dx, di, — © [Tyta L o 
oi 


2no,c,(l — p?)* qs p?)loi 0365 
all cumulants кш, 7, s> 2, vanish; and further, if 
dE 
dis = mm 


Ars = (r qe 1)p2,-1, 8—1 ШЕ (7 — 1)(s ae 1) = p*)4,-2, 5—8 


_ (2r)1(28)! x 2p) 
л ш 2 0 —316 =} 


r++) x 


1 (20)? 
Jor, 2541 D 


^e 0-26 — 28i T 0 
Ao», 2s+1 = 2,1, 2s = 0, 


where ¢ is the smaller of r and s. In particular, 


An = k 2а. = Зо, Asi = l5p, А = 105p, Ay = 945р 5 
Аз = (1 + 2р2), day = 3(1 + 4р2), А = 15(1 + бр?) 
Ass = 105(1 + 8р?), As, 10 = 945(1 + 10р?) ; 

Ass = 3p(3 + 2p*), Ass = 15p(3 + 4p?), 

Ала = 3(3 + 24р? -+ S21). 


3.16. (The Gauss-Winckler inequality.) If f(v) is a continuous non-increasing 
frequency function ranging from 0 to co, show that the rth moment about the origin is 
equal to f(0)p,.4/(* + 1) where Pri 15 the (r + 1)th moment, about the origin, of the 
function 1 — f(2)/f (0), which is a distribution function. Hence show that for any frequeney 


nction which is continuous and has a single mode the moments about the mode obey 
the relation 


1 А 
{(r + lw < ((n + 1)», )* 


frr <n. In particular и, /u? > 1:8. (See von Mises, J. für reine und ang. Math., 1931, 


165, 184.) 


CHAPTER 4 
CHARACTERISTIC FUNCTIONS 


Moment- and Cumulant-Generating Functions 


4.1. In the previous chapter we considered the characteristic function 


=f 


-n 


aaan i STO `1) 


as a moment-generating function. We have also 


v0) = 10860) = Ex, TE 


p(t) being known as the Cumulative Function. It generates the cumulants in the same 
way that the characteristic function generates the moments. If the moment of order r 
exists, Ф(/) can be expanded in powers of ¢ at least as far as the term in (it, and so can y(t). 

Other functions can be constructed which generate the moments. For example, 


since for | iz | < 1 
1 WM. 
= a 


Gee тау 
[a Ti iik . . . . ‚ (48) 


Generally if a function ¢(t) can be expanded as а power series in f£, Xa, 
to existence (and convergence when the series is infinite), 


we have the formal expansion 


we have, subject 


|р c(t) dF = Уаш. _. . i A „ (4.4) 


Since 
2 217 
(+? = cs 


7=0 
we have 


e edi 

a(t) = pe + ағ = P jin X OMA ED os 
j-07^ 

and thus w(t) may be regarded as a factorial moment-generating function. We may also 

define a factorial cumulant-generating function 


Ü 
log o(t) — 2 зки С . = . . e (4.6) 


though this function has not come into general use, 


4.2. The generation of moments is by no means the most 
characteristic function, and in this chapter we discuss some 
it à fundamental place in statistical theory. 

90 


important property of the 
of the theorems which give 


A 


THE INVERSION THEOREM (91 
We recall, in the first instance, that ó(f) always exists, since 


n ear «| [ее | dF . 


<f ar-i CU HM S 


so that the defining integral converges absolutely. Further, (f) is uniformly continuous 
in ¢ and differentiable j times under the integral sign if the resulting expressions exist 
and are uniformly convergent, for which it is sufficient that »; exists. For then 


| 6901 = | | аена 


<] 1ай»... МЕЕ 


The Inversion Theorem 
4.3. We now prove the fundamental theorem of the theory of characteristic functions, 
which will be called the Inversion Theorem, namely that the characteristic function uniquely 
determines the distribution function; more precisely, if Ф(/) is given by (4.1) then 
1 (= 1 — ei! 
| F(x) — FO) = = f w g—4 « . . (9 
the integral being understood as a principal value, ie. as 


1 fe 1 — e-izt 
ay oot )——— gk 
c Hm э] A ) a 


| | £ Further, if F(x) is continuous everywhere and dF = f(a) da 
1(® -izt 
a) cq; D ; VP aes s (4 
fe) = f Фей (4.10) 


the integral, as before, being a principal value if there is not separate convergence at the 
limits. Equation (4.10) may be compared with the form 


40 = [flere de, Рет 


he 
Дд } the comparison exhibiting the kind of reciprocal relationship which exists between f(x) 
4, } and ¢(t). 
А Аза preliminary we require an integral due to Dirichlet. It is easy to show that 
EIE 
Jj б @ 2 
ү Putting E 
(n+1)r | sin y b 
i (Te f Ld da: 
We have da | M 
x Н. чь ko Sc (es yg et Кун 
in which the terms decrease monotonically to zero in absolute value, Now let H(x) be 
X а positive decreasing function. Consider 


sin x 


) І, =f a(i dp z[ METTE 
A |. 0 © na y 


dz. . — . (4,12) 


92 CHARACTERISTIC FUNCTIONS 


Writing H(-+ 0) as the limit of (е) as e— 0 (e positive), we have, in virtue of the decreasing 
property of H, that any term in the series on the right in (4.12) is not greater than u, H(-- 0) 


Further, as the series alternates in sign, the difference between I, and | Н () E © ae 
0 р 


is less than u,H(-+ 0), which tends to zero uniformly in n. 
convergent and we have 


lim H/E ы! lim H(2) "22 ax — SA + 0) .  . (413) 
Dp—-«J0 Ж 0 v 2 


Similarly we have 


Consequently І is uniformly 


s 
lim | sas = Sa Ha 6 & Gi 


p— == 
By a simple change of sign the results are seen to be true if Н (2) 
function. It is therefore true of any function which can be express 
ence of a positive decreasing function and a negative increasing fun 
of a frequency function or distribution function. 
Adding (4.13) and (4.14) and writing H(0) for }{H(+ 0) + H(— 0)} 


is a negative increasing 
ed as the sum or differ- 
ction, and in particular 


we have 


lim | H(SPP74ds-aH(0) . . . , (4.15) 
ъ=) ә pj = 


and putting px for in this expression, 
Em | Шш dea, . . сиде) 
p—> a-o z 
the so-called Dirichlet integral. If H(«) is continuous at x = 0 the value 
HEC 0) + H(—0)} 


is of course the usual value H(x = 0). 
Now consider 


с Ue 
J: - | #00) af ee, "УЗВЕ да ЕТТ) 
=f 0 
Putting elt) = f еї dF (x) 
c Ы. x 
we have „= | alf ейт ape) e*t dé \. 
= m 0 


The product in curly brackets may be equated to the double integral 


Xpo 
| | £9 dB (x) dé 
0 J—» 
which is evidently uniformly convergent. Making the transformation 
ug 


z=6€ 


we have 


des 
| | | E" IP, y + г) da 


THE INVERSION THEOREM 93 


and integrating with respect to z, 
p? x 
| eit [Fu + : dy 
ЕА D 


= É e" {Fly + X) — Fy) tu. 


This also is uniformly convergent in ¢ and hence, integrating under the integral sign with 
respect to t, we have 
giu 


Jy ИЕТ +X) — Р0)} dy 


= (259 try +X) — Р) ERUNT DEN ad 
ө У 


Since cos cy is an even function. 
Now (4.18) is a Dirichlet integral and we have, therefore, 
lim J, = 2x (F(X) — F(0)}. . . . ‚ (4.19) 
0—99 
Referring again to (4.17) we thus have, writing now = for X, 
c r Е 
F(x) 0) = lim | $(t) af e-itt dé 
27 с о J-e 0 
and integrating with respect to £, 
Те 


FG) — F0) = = tim f go 


which is the result stated in (4.9). It is to be remembered that in virtue of our convention 
In arriving at (4.16), F(x) at a saltus is }{F(~ +) + F(a —)). 


dt, 


4.4. This expression may be thrown into an alternative form. From the definition 
of ф(1) it is seen that $(/) and 4(— t) are conjugate quantities, and we thus have 
Rit) = 3(9(0 + $(— 9} 


1) = 3:60) — $(— 0) 


R and I being the real and imaginary parts of ¢(¢). Thus 
т 1 — е 
F(e) — FO =] sat 


and by a change of sign in t, 
-Li[ {#0 == _ cosy t 40 4C 0 
ni. ET us sin at + 10а = cos at) 1 


^ 
.i sin d dt 


. А . 5 А e (4.20) 
This integral is, of course, real. 


4.5. If now F(x) has a derivative f(x) we have 


fe) = soa RE 


2ndx с>. Jo it 


794 jse CHARACTERISTIC FUN CTIONS 


The integral being uniformly 


convergent in z, the differentiation can be carried out on 
the integrand, and we have 


dn. Р 
Јо) = [neat 
the integral being а principal value. 


4.6. Consider now the expression, with a slightly different definition of J,, 


э = |" «ea. T Rus dodo 


Tf the distribution function F has a derivative f, this is equal in the limit to 


gan 2 =lim gf) =0 


7 с - 
and consequently p^ tends tozero everywhere where F(x) is continuous 


ie. if the frequency-distribution is continuous, 


If, however, the distribution is discontinuous, consider one point of discontinuity, — , 
say the frequency fj at а. The contribution of this part of the frequency to plt) will be ўз 
Jj €", and thus the contribution to E will be | 


апа differentiable, 


L [e 7 у 
i fi e e^ ür dt 
=c 


m alza) c 
тз | дє — Ы 


If x = 15 this clearly tends to zero; but if z = x; it becomes 


l fe 
sale dt = f;. 


Thus the function Je tends to f, at x = ft. 
2с 


: Hence, if 2а tends to zero at a 
function at that point; bu 
is discontinuous at that poi 
а given characteristic func 


point z, there is no discontinuity in the distribution 


t if it tends to a positive number fj, the distribution function fo 
nt and the frequency is fj This gives us a criterion whether W 
tion represents’ a continuous distribution or not. 


Example 4.1 


We found in Example 3.10 that the characteristic function of the normal distribution 
1 as 
dF = — e 86 dy = 
г oV Эл 4 20 

із P(t) = е3", 
Suppose we аге given such a fun 
it is the characteristic function. 

In the first place we note that the distribution, if any, 


{iot 
Jo ZI € zeit qt. 
20. 2c], 


ction and require to find the distribution, if any, of which 


is continuous, For 


THE INVERSION THEOREM 98 


f : 4 ues HOA DO SEE aum 
The integral is less in modulus than [ € ? di which is less than | е 2 dt = 


=e —o 


ta? 


Thus Zl. 0, everywhere. We have then for the frequency function, if any, i 


ю KB? ig 


76) = E e ?e dt 
A m da? 
= Dey — 1G + =) dt. 
This may be regarded as an integral in the complex plane along the line parallel to the 


real axis. Taking to + z as the new variable in place of ї, we find that the integral is 
C 


vee 


f Lie 
in fact 3l е dé = 
(ra 4 с 


Thus 
а 
е 20, 


сүл 


ЈЕ) = 


This is everywhere positive and | dF converges. Hence it is in fact a frequency function 


with the given expression as a characteristic function. 


Example 4.2 
To find the frequency function, if any, for which 
40 = eti, 
J . 
We note that m tends to zero and that the distribution, if any, is continuous. We then 
have for f(x), if it exists, 


fle) = = e^ M lg itz qt 


x 4T. etit qt +| е "E a] 
2л = " 


— e^t dix IE 
= (etz + e itr) dt 


1° 
1, e^t cos tx dt. 
ло 


1 


This may be evaluated by two partial integrations. We find 
1 o oo 
fiz) = Б €^! cos is] — zf e~ sin ta dt 
d о л 


a €^! sin £. е; 
—— in an = cos ta dt 


1 
x} mum ze? (a). 
us 
1 


f(a) = AU pay’ 


— о X v «X o. 


96 CHARACTERISTIC FUNCTIONS 


As before, this function can represent a frequency function, and it is readily verified that 
f(z) has, in fact, the required characteristic function. 


Example 4.3 
Does there exist a frequency function for which 
d(t) = et? 
We have 


1 


с 
Je È? ges dt = | #Ча-® dy, 
26 26 2c] —6 


If 1 — v is not zero the integral is 


n [cos {(1 — a)t} + isin ((1 — z)t]]dt. 


Since sint is an odd function this is equal to 


|р cos (1 — a)t) dt = [= =ч | 


1—2 = 
This does not converge, but it is bounded and hence 1m 0. 


lf, however, x = 1, the integral is simply 


iP dt 


Thus there is unit frequency at а = 1 and it is seen at once that this accounts for 
the whole of the frequency, so that there is no frequency elsewhere. T'he distribution 
thus consists of a unit at ж = 1. This is otherwise evident from the consideration that 
log ¢(¢) = it, so that the second cumulant is zero and there is по dispersion. 


and thus Je к=; 
2c 


Example 4.4 


For what distribution, if any, are the cumulants given by k, = (r — 1)! ? 
The series 
Ro es ash 
Pal = Х S 
тет ad: J 
converges absolutely for | # | < 1 and is thus equal to y(t) if such a function exists, 


We have 
ity 
v = ze ЕС] 
| 1 
th Bees 
and thus H(t) En 
If the frequency function exists we have 
` 
I e—izt 
2) = — 
fo-rp a. 


This integral may be evaluated by integrating the complex function 


—ixs 
REO round a contour 
consisting of the real axis and the infinite semicircle below that axis. The first part reduces 


THE INVERSION THEOREM 97 


to the integral we are seeking. On the semicircle of radius R we have z = R(cos 0 + i sin 0) 
and the integrand becomes 
exp (— ix cos 0 + aR sin 0) 
1:8 соѕз0 + Rsinü ` 
6 here lies between л and 2л and hence sin 0 is negative. Hence if ж is positive the expression 
is less in modulus than 


e-zR|sin 0] 
ECT ЕД 
ie. tends to zero as R— co. 
e-iz 
Now the function 4— - has a pole within the domain of integration at z = — i and 


the residue there is ie~*. Hence 


1 к 
flv) = gine 


=e" 0<х < oo. | 
irs 
More generally, if к, = p(r — D p > 0, it will be found that the residue of —— —— ü cau 
iPyp-1g-z 
—,——., so that the distribution is 
Г(р) н 
eun 
(x) = —À——: 0 <a 5 0. 
fe) = Tay seo, р> 
Example 4.5 


For what distribution, if any, are all cumulants of odd order zero and those of even 
order a constant, say 2a ? 
We have 


y(t) = af E + = as } 


This series converges and 
w(t) = 2a(cos t — 1) 
Hence A(t) = e2a(cos t-1), 


If we try to integrate f e?acost-D e—ite dt in the ordinary way we fail. Let us 


then look into the question of continuity of the distribution function, 
We have 


с 
J, = e| "n cos t gt dt 
mi 


c 9 2 
= esi s 2a)! cos it gat qt, 
zg j! 


j=0 
The series is uniformly convergent and hence 


Je = a LS Cay cos it e~it dt 
a ji 


ert (ay cos ? cos at dt, 


: H H jen S ?' 
since віп vt is an odd function. 
A.S.—VOL. I. 


98 CHARACTERISTIC FUNCTIONS 


Consider now the integral | 2 cos?t cos xt di. Ву a well-known expansion 


-0 
2! cos /t cos xt = Ae + е) (eizt 4 e7 itt) - 
= (eet + emen ES (em TEES e. 
"The only part of this expression of present interest is the constant term, the others not 


contributing more than a finite amount to J,. The coefficient of e° is zero unless ж is integral 
in absolute value, and in that case is 


ie ( z М ( : Jj - ( 2) 


>, tends to zero unless x is integral in absolute value, and in the latter case 
2c 


bole 


Thus, if z is even, say 2r, the frequency at z = + 2r is 


"hs | алб q?*? (9r +2 att (2p. 4 
е [e Moser: гет .)*--- 
E) а?" а2"+2 ara 
: fon (бк pil Т Gay tes: 
and if x is odd the frequency at z = 2r + 1 is | 


> 


ае: 


€ arti а2"+3 а27+5 
lg +1)! (2r + 2)!! ^ (2r + 3)!2! ias n 


We may now verify that these frequencies account for the whole of the characteristic 
function and hence that all frequencies have been found. 


Conditions for a Function to be a Characteristic Function 


4.7. Any function which is not n 


/ ris iegative in its range of definition and which is 
integrable in the Stieltjes sense can be a frequency function; and any non-decreasin 
function which increases from 0 to 1 in its range of definition can b ai 


icti us е à distribution function. 
There are much more restrictive conditions to be obeyed before nction 


xd f a given function can be 
a characteristic function. 7 
Tn the first place, let us note that it is a necessary and sufficient condition for a function 
#(t) to be a characteristic function that 
ДИ 1 — e-t2? 
ET 1). — ——— d 
Pe) = z| sa 


shall (except for an additive constant F(0)) be a distribution function. This, however, 
is not a very helpful criterion in practice. 


DISTRIBUTION AND CHARACTERISTIC FUNCTIONS 99% 


. © 
Looking to the definition of ¢(¢) as | etz dF, we see that necessary conditions for 


—© 

$(/! to be a characteristic function are 

(а) that ф(Ф) must be continuous in f, 

(b) that ¢(é) is defined in every finite £ interval, 

(c) that 4(0) = 1, 

(d) that (t) and ¢(— t) shall be conjugate quantities, 

(c) that | Ф(0) | < | je | dF <1 = (0). 

These criteria enable us to reject certain functions as possible characteristic functions, 
but there do not appear to be any readily applicable sufficient conditions which enable 
us to determine at sight whether a given function can be a characteristic function. 


Limiting Properties of Distribution and Characteristic Functions 

4.8. Suppose there is given a sequence of distribution functions F,(v) depending 
on à parameter n which can increase indefinitely. To each Е, there will correspond 
à characteristic function ¢,. The question to be discussed is this: if F, tends to a limit 
F, will 4, tend to a limit ф and is ¢ the characteristic function of F ? Conversely, if Ф, 
tends to a limit ¢, does F, tend to a limit F and is F a distribution function having ¢ for 
its characteristic function ? The answers to these questions, as will be seen below, are 
affirmative under certain general conditions. 

It is to be noted what is meant by a distribution function tending to another. If 
both are continuous, Р, (ж) 18 said to tend to F(x) if, given any e, there is an n, such that 
0.00) = I(x) | < е for all n 2 m, If there are discontinuities present, P, will be said 
to tend to F if it does so in every point of continuity of F. Since by definition our functions 
are taken to be continuous on the left at saltuses, this evidently conforms to the definition 
for the continuous case and to the common-sense requirements of the situation. 


4.9. We require two preliminary theorems for later work. The first is that if F 
tends to a continuous F it does so uniformly. 3 
For the range can be divided into a finite number of parts, say at &;, Са е. &, such 


n 


" E " 3 г 
that 7(£,,,) — BE) < D for all j. Then as increases there will come a time when 


EAG) — F(&)| <$ for all j. Thus there exists an то such that for n > No 


ИЕЛ 
It is sufficient to show that this implies that for any x 
ГР, (а) — F(x) | «e, © Ж 7 e. 
In fact, if x lies between & and £j, 
PE) < FO) < Р) < FG) +2 
and 1 


€ 
PE) —g < Рё) < Pale) < Ра) < FL) t$ «Fs 


and thus =E < Fx) — F(x) <e, ` 


which is the required result, 


100 CHARACTERISTIC FUNCTIONS 


4.10. The second theorem we require (the Montel-Helly theorem) is that if the 
sequence F(x) is monotonic and bounded for all x (which is so for distribution functions) 
then we can pick out a subsequence F„ (х) which converges to some monotonic increasing 
function F (not necessarily a distribution function itself, for it may not vary from 0 to 1). 

Consider first of all a series of values Y; X: ... It is known that every bounded 
set of numbers contains a convergent sequence. Hence we can pick out from the sequence 
Е (21) а convergent sequence, say, F, (xı). Then from the subsequence F, (a,) we can 
pick out a subsequence Р, (22) and Р, (x) is thus convergent at both v, and Ta Continuing 
in this way we may, by picking out the first function in Р, (x), the second in P, (х), and 
80 on, arrive at a sequence of functions G(x), G(x)... which converges at each of the 
values 2, %, . . ., etc. This is the so-called Weierstrassian diagonal process. 

It follows that the sequence G, is convergent at every rational point x. Since 
G,(a) < G,(x) < G,(b) for every x between a and b, we see that if G, (a) and G,(b) converge, 
the limiting values of G(x) lie between those limits, say G(a) and G(b). 

Then the function u(x) = upper bound of G 
well defined and non-decreasing and so has no more than an enumerable number of points 
of discontinuity. If u is continuous at z, we take у and z such that y <v <z and 
u(z) — u(y) <e. Then if а and b are rational points such that y <a <s <b <z it 
follows that u(y) < G(a) < G(b) < w(z) Moreover, as all the limiting values of G.,(a) 
are between G(a) and G(b), they are between u(y) and u(z). Hence, as є can be arbitrarily 
small, we see that G(x) tends to u(x) at every point of continuity of u. Finally, by the 
diagonal process, we can select à sequence which will also be convergent at the points of 
discontinuity of u(x). The theorem is established. 


n(x) (ж not necessarily rational) is 


The First Limit Theorem 


4.11. We now prove the theorem: if a sequence of distribution functions F, tends 
to a continuous distribution function F, then the corresponding sequence of characteristic 
functions ¢, tends to ¢ uniformly in any finite t-interval, where $ is the characteristic 
function of F. 


It is required to prove that, given e, there is an m independent of # such that 
(0-4 01-1 еар ағ) us na. 


Select two points of continuity of F, X and — 


X. We can make X as large as we 
please. We then split the integral 


| sar —ar,) ов ee ео) 


into two parts, that in the range — 


X to + X and that in the remaining portion of the 
range. Now 


> EN aX | 
f ear | <f dF <1 — F(X) — F(— X), 
т<—Х х<—Х 


and by taking X large enough we can make this quantity less than © 


Similarly 


>X | 
| вав, <1— F(X) Ру 2) 


2<-Х 


THE FIRST LIMIT THEOREM 101 


and since Р, tends to F (and that uniformly) this, for some large X, will be less than 
5, Hence for some x, the portion of (4.22) outside the range — X to + X will be less in 


modulus than + 5 = =. Consider now the other part 
9 - 


fewer Sie) TS ee 
This expression is the limit of the sum 

Seles {F (E41) — FENY — (654) — Р), . . . (4.24) 
&; Ej, being the boundaries of the interval into which the range is subdivided and FN 
a value in that interval. The difference between this sum and the limiting value can be 
made less than i if the intervals are small enough ; for if they are less than 7 in width the 


difference of ej and ei; is less in modulus than 7|¢|, by the mean value theorem, and 
thus in any t-range + T the difference of (4.23) and (4.24) is less in modulus than 


aT | SUF Ea) — FE — (65) — Р) < 207, 


which i an 5 dec. 
ich is less than gin<gp 


Now the sum (4.24) will itself be less than 5 for some n > то, for it is the sum of a finite 
E 
number of terms each of which tends to zero. Consequently (4.23) is less than = and hence 


2 
I$ €) — palt) | <, n > то. 


Converse of the First Limit Theorem 


4.12. The converse result is even more important : 

Let ф, фе a sequence of characteristic functions corresponding to the sequence of dis- 
tribution functions F,,. Then if ¢„(t) tends to ¢(t) forall real £ * uniformly in some finite 
“interval, F,„ tends to a distribution function F and ¢ is the characteristic function of F. 

As a preliminary lemma, let us prove that if F is a distribution function with char- 
acteristic function $, then for all real ё and all h> 0 


Жыш my _ 1¢? (sin \2 -28 узу 
xl. Fondu — i| | FQ) du -ME ye SOLL « . ж (4.25) 


In fact, put 


G(x) = if. Feo du. 


This is a continuous distribution function and its characteristic function is 
f eite dG 
1(® ilr f 
EIL. e {P(x + h) — F(x)} da, 


* à, H ` : 
Or equivalently, if g(t) is continuous at t= 0 or if Ф(1) is a characteristic function, 


102 CHARACTERISTIC FUNCTIONS 


which by a partial integration becomes 


ee +h) = F(x) =i е {ФЕ (ж + h) —dF(x)} 


h it se it 
xs | {c= F(x) — et dF(a)} 
wh J s 
1 — e-ith ч 
=@ у 


ith 
Substituting for G(x) in (4.9) we get 


Wee pith 
il Е(и) du — i Е(и) du 


aye js (p yee 0) dt, 


© Oz. os it 


whence, writing & for & + h, we find 


l(5*^ j 1(5 » 5 TuS. 1 — e- ith 2 = (E) 
iJ, F(u) du — АШ? du = zal ( E -) ењ) dt 


l(? /et — e-it\2 it 7 94 

== е ————— h — 
|. _( it ) " Є ) E 
1(° /sint\? 20 үә; 

=- —- а 
ser) eo) 


the resul& announced in (4.25). 

Reverting now to the theorem required to be proved, note that it is sufficient to establish 
that if $, — ф uniformly in some interval | t| <a, then Р, tends to some distribution 
function Р in every point of continuity of P. When this is established it follows from the 
First Limit Theorem that ¢ is the characteristic function of Ё and that $, converges to 
$ uniformly in every finite t-interval. 

As shown in 4.10, given a sequence Р, we may always choose from it à subsequence 
Fy such that F, converges to a non-decreasing function F in every continuity point of F. 

Let us then choose such a sequence. We have of necessity 0 < F < 1, and P may 
be supposed everywhere continuous on the left. It is then a distribution function if 
F(+ со) — F(— oo) = 1, and this we proceed to prove.* From (4.25) with £ = 0 we have 


1p" ipo l(^ /siné\?, /2¢ 
Гө] ИДИ — = n = > E © 
g| Bow) du | Faw) du AAS) Z0 di 


` By hypothesis 4, tends uniformly to Ф for |t| <a and hence dn 
seen that the integral on the right is uniformly convergent. 
ho such that for h > h, 


1h 1f 9 1p? /sin A2 op 
ет 2a 2 
i, ap re - ij e ILLE 


—tah 


‚ does so, and it is easily 
Thus, given €, We can find 


* It is not obvious that if the functions F, all v: 


ary from 0 to 1, the 
fact, if F,(v) = 0,2 < —n, Ёш) = 4, -n< = < 


n, F(x) 


n their limit must do so. In 


BE r Hen mm Oye 
— o«&v« o. n—>20 


CONVERSE OF THE FIRST LIMIT THEOREM 103 


where | 5| <e. Now let Ё tend to infinity. As F is a non-decreasing function the left- 
hand side tends to F(+ оо) — F( — co). The right-hand side tends, in virtue of the 
uniformity of ф„ апа the consequent continuity of ф near ¢ = 0, to 


| (® 2] dt, 
п). t 
which is equal to unity. 


Hence F, the limit of the subsequence F, is a distribution function whose characteristic 
function is ¢. f 

But any subsequence of ф„ tends to ф, in virtue of the uniformity of the convergence, 
and hence any convergent subsequence of Р, tends to F. Consequently P, tends to F in 
every point of continuity of F and the theorem follows. 


tm 4.6 


The binomial distribution (g + р)" considered in Example 3.2 has the characteristic 
function 


(4 + рей)". 
Now the frequency at x =j is (Pew. This is greater than the ordinate at 
i}. 


т=з 
n \ m-ipi n ) n-j-1ygil 
(у p> (; 1 q D 


j> рп — 9. 
For large n the maximum frequency will then be in the neighbourhood of j = pn, and 


is then 
à 


In virtue of Stirling's approximation to the factorial this approximates to 


or 


m^ 7^ A/ Ian qip?” l 
(pu)? e Pry (2zcpn (qn e-ta (2л) ~ y(2apgn}) 
and therefore tends to zero. 
Thus every frequency in the binomial tends to zero and the distribution does not tend 
to any limiting distribution. 
Suppose, however, that we express the distribution in standard measure. Putting 


rep. 
ё = E we have 


4.) = f ей d F(x) = f gitostiin q F(£) 


Hence 4t = e s e). 


The effect on (¢) of transferring to standard measure is then to replace # by : and 


itu’ 


to multiply bye s. 


104 CHARACTERISTIC FUNCTIONS 


For the binomial и, = тр, из = npq, and thus the characteristic function of the 
binomial expressed in standard measure is 


inp be 
СДИ Кебе magi) 


set- 
log Ф = opi T log + це ne a 1) 
| 


Thus 


— itnp pt? pote 
= 1 i 
(кра)? a Prp 7 2прд ^ (про) Ке" 
= Мас T: Up 3-3 
=n wer 73 A + 0(£9—73) 
= — M? + 0(n71). 
Thus for any finite і log ¢ tends uniformly to — 14? and hence 


A(t) — e. 


Thus the distribution (9 -+ р)" expressed in standard measure tends to the distribution 
whose characteristic function is e-?^, ie. to the form 


Er sm = 
dF = Ve y "dy, со « x « oo. 


Multivariate Characteristic Functions 


4.13. The characteristic function of a bivariate distribution F(x, х„) is defined as 


(hs, te) =. etico Tm ш) з ; . (4.26) 


and generally, that of a multivariate distribution F(v,, x, = Gy) 85 


P(t, 1... . ty) = Lt. T d ФӘН use a aca REED (027) 


lfz,z,...z, are independent we have 


$( toy sos ы) = be ШЕЛ аа) | егт ARIEN ... (a ейт dF, (2p) 


Similarly =P) oa Ble) S 2 2. . (4.28) 


w(t, ts... 0) = Mee ^ А 


Thus the characteristic function 
of the T 
variables is the product of their psi distribution оба шойи of dependent 


ic functi 
К the sum of their cumulative functions, This is a f its and the cumulative function 
theory of sampling. mentally important result in the 


„ a (4.29) 


4.14. In generalisation of (4.9) we have 


T, Way ees Ta) — F(0, 0, КУГ, 0) == _ 1 T = L= са i 
(22 NC | T : l — етн» 
ы LÁ 

A(t, ty t,)dt, dt 1, 


4 


^ 


MULTIVARIATE CHARACTERISTIC FUNCTIONS 105 


The multiple integrals are to be interpreted as principal values 


-" с с 
lim | oat | : . 
с—> = —e =c 
The proof is similar to that for the univariate case. We have 
di ? sin ж, sin x, az 
Ј = TE эжи NU es OM 
. | 0 | о 4л Tn ү (5) 
lim f sies | Her эл з) i. LU ы dx, = 2"H(+0...+0) 
р —» о -%0 -0 p p E v 
0 20 1 1 H 
lim | Se d | EL i, + a.) pma жос он , da, = п"Н(0,.... 0). 
n—> a-o -o k v, 


Considering now 


J, = i hs Ji dives SOR ES. anf”. Я T exp(—it,é, ... — it EE, ... . d&, (4.81) 
we find that кР. | | 
a l " | Senen, а S50 OI A oe 
T ae ОА 


У "n 
Im J,-—(ePyQFe...-)—27(0...0) 
c со 
and by considering the integration of (4.31) with respect to the é’s the result (4.30) follows. 


4.15. If we have a distribution F(x) and some function of the variate such as £(x) 
we may consider the characteristic function of & 


P(t) = | е dF (x) А 5 . . . (4.32) 
The distribution of £ will then be given by (4.9) or (4.10), e.g. the distribution function of 
& say G(é), is 
| exp (— it£)d.(t) dt. . . . . (4.33) 


The Problem of Moments 


4.16. We can now consider in more detail a problem which suggested itself in Chapter 
3. Do the moments determine the distribution uniquely, and if not, under what conditions 
do they do so ? То give some point to this question let us note that in some circumstances 
it is possible for two different distributions to have the same set of moments, 
Consider in fact the integral 


| Plea dt = 1, р> 0, Ra) >0 
Ux wm 
H 
Put p= erg т а non-negative integer 
0cAci 

9 —«- if 

um tan Az 

a 

с=т. 


106 CHARACTERISTIC FUNCTIONS 
We find on substitution that 


| зле (cos B^ + i зїп fa?) Ada: 
0 


n+l 
les) 
а] + itan 2л) 
and since (1 +itan 2x) =H _ 008 (w + Da + isin (n + Iz 
(cos 2x) 1 
=a real quantity, 
the imaginary part of (4.34) is zero. Thus the distributions 
Ха) = еар + esin (6272) e 
O0 «x < со, a> 0, 0 —2 3, 


SES 6.04.38) 
lel «1 
have moments independent of e, and (4.35) defines a whole family of distributions having 
- the same moments. 
Similarly, if we substitute 

(2n +1 E Te рл 
== 9 = 0 + if, D SE CE $431 
we find that the family 


9. 
а? =t, pr = 


(8 a positive integer) 


f(x) = е^? (1 + & cos (аја) } (4.36) 
ол, i 
— 0 <2< 0, «0, 0 <р <1, [e| <1 
all have the same moments, the range in this case being infinite in both directions, 


4.17. In full generality the problem of moments may be formulated as follows : 
Given a Sequence of constants су, Cinema 1 


TIROL 
(i) Does there exist a distribution function F such that 


b 

r ze 2 - 
E GP — e? > e (4.37) 
(ii) If so, is the distribution function unique ? 
ns, if any ? 


EE t . ur treatmo n i 
of statistical interest, but we ma: indi OD 
results of Stieltjes. ое thie Principal 
If we express the series 
2.3. . ia 
— zi : (4.38) 
as a continued fraction of the form 
1 1 1 1 1 1 
az + a: + az + а, + nz + ay, F > + (4.39) 
then, if the limits in (4 37) are 0 to co 


xu 


THE PROBLEM OF MOMENTS 107 


existence of at least one F that all the a’s be positive; and F is unique or not according 


as Ха diverges or converges. 

7=1 : 
The case when the limits in (4.37) are + oo has been treated by Hamburger (1920), 
who showed that an F exists if the expression of (4.38) as a continued fraction of the form 


bo b, b. 


» E ° ‚ (4.40) 
а +2 + Gic £d а +2 + 


gives positive values of the b’s. In order that F may be unique it is necessary and sufficient 
that the continued fraction be completely convergent in a sense defined by Hamburger. 

We shall see presently that for finite limits in the integral of (4.37) the function F is 
always unique. 


4.18. Unfortunately the Stieltjes-Hamburger criteria are not of much practical use 
because, as a rule, it is too difficult to express the a's and b’s of (4.39) and (4.40) explicitly 
enough in terms of the given c's to enable questions of sign or convergence to be decided. 
We may, however, derive some criteria of statistical importance by considering the more 
restricted problem : given the moments of a distribution, can any other distribution also 
have the moments? In other words, we are given the existence of one F and require 
to know whether F is unique. А 

Note in the first instance that this problem need only be considered when absolute 
moments of all orders exist. It is evident that more than one distribution can exist having 
a limited number of moments finite and the remainder infinite. Furthermore, if any 
moment of even order exists, those of lower order must exist.- In particular, if шу, exists 


2; 9 9, H 2r— 
v^" dF and | а? ағ exist separately, and hence so do | |a*"-1|dF and 
=& 0 


0 20 
| | x"-1| dF, and so also does | 227-1 | dF, the absolute moment of order 2r — 1. 
=% 


Thus we consider only the case when all absolute moments exist. 


4.19. We will prove in the first place the theorem that a set of moments determines 


E. d К 4 Я 4 I 
а distribution uniquely if the series > ri converges for some real non-zero і We write 
Teo че 
v without the prime in this and the following sections to save printing, but the result is 
true for moments about any origin. 
_ The characteristic function is continuous in ¢ and its derivatives exist if the moments 
exist. We have then in the neighbourhood of t = 0 


r 


f(t) = 2 ooy +R, . А 1 д . (441) 


j! 
j-0 7 


where R, is less in absolute value than ae (3.14). 
r! 


Thus if >” vt : 
us if 2 converges, FE tends to zero and hence ¢(¢) is equal to the sum of the 


Dr EN UE 

infinite series У Ce j 
it В ч f à m] 

: aj it exists. Moreover, this series is majorated by I and hence 


108 CHARACTERISTIC FUN CTIONS 


is absolutely convergent if the latter is convergent. Hence we have 


(it)? 1; : 
УЕ . . . 5 ‚ (4.42) 
Oe т ( 
quely determined in the neighbourhood of t = 0. In the neighbou 


^N 4 


and thus ¢(é) is uni rhood 
of t = 1, we have 
eit Prec ; 
#0 = ye i f жеар) +E 
j=0 
s; (t — t)i 
and the modulus of the coefficient, of 


Л is not greater than >. Consequently $(t) 


aylor series and is equal to the sum of that 
series. Hence $(t) may be extended from the ne by analytic continuation 
through any finite ¢-interval. Hence ¢(t) is everywhere uniquely defined. 
But ф(Ф) determines the distribution function and hence the latter is uniquely deter- 
mined, 


4.20. Asa corollary of this theorem we have the result that a set of moments uniquely — ! | 
determines a distribution if 


| S s» nda 
П in E 
For the series rw is convergent if 


1 
i: у, enin n ym e | п 
Jm (easet) em lim + et < 1. 
( n n > 


= a Wy 
8S that lim ёз» should be finite, & form ( f 


1 1 
2n 
У] & vy In = ies, 2n T 
so that 3 са 
1 == 22 Qn 1 1 2 
yo, na 2 крт р ИТ 1 
2n — 1-1 БЕЛ зуе «Sm 125 Т?з 
Taking upper limits throughout we have 
1 
n тей OL. о ЗЕ 
Шш 9n — j^m-1"-! < lim Эдип" < lim 1 1 


sci 
S 2п+ 
2n -- 1 2m4 t 


дыз] tL L—1 ,j , 
and thus lim = 7»,^ and lim ?» are finite or infinite together, 


Open 


THE PROBLEM OF MOMENTS 109 


4.21. Asa further corollary we have the result that a set of moments uniquely deter- 
mines a distribution if the range is finite. For suppose the range is a to б. Taking an 
origin at z =a and letting b — a — c, we have 


1 
pa = [= dF «c^. 
а 


1 1 
Thus - ГЕД =r," < с 
1 


and hence lim 2" = 0. 
n 


4.22. Two further criteria may be mentioned. The first is due to Carleman (1925). 
A set of moments determines a distribution uniquely if (in the case of limits — œ to + œ) 
= 1 
zt 0 
7—0 (Hz)? 
diverges, For the limits 0 to оо the corresponding series is 


© 1 


"ut 
7—0 (1) 
if, Pig ned if there exists a frequency function, the moments determine it uniquely 
» tor limits — œ to + co 
Ја) < М | | 71677" for |x| > а„ (M, В, a, >0 A<1). p 
and for limits 0 to œ 
fle) < M |x| 7-1-9 for |x| > 2, (M; B, 2-0 AB) e . (447) 


mM result is due ultimately to Stieltjes. It follows without difficulty from the Carleman 
Criterion, 


It is interesting to note that if for some 2, 
f(x) > e7? (20 v»w) i . '.(48 


then the problem of moments is necessarily indeterminate (as usual, 2 < 1 for the range 
0 to el and 2 < 1 for the range — œ to + оо). This follows from the examples in 
equations (4.35) and (4.36), for we can add to (4.48), without rendering any frequency 
negative, a function all of whose moments are zero. 


Example 4.7 
The moments of the distribution 
1 


oV 2л 


аР = 


a 
e 20 dx, —o<2<0 


аге gi 
шу api = 0 
_ (2r)! E 


Hr Frl 


110 | CHARACTERISTIC FUNCTIONS 


1 18 с {(2n)! zx 
Тн та куте 
с Уен), 
Fi avi] епт (25) 
с 2n 


ne) n 
20 

~ Убе 

and thus the upper limit is zero and the distribution is unique, * 


4.23. Ifthe moment of order r exists it must be given by 


B- er. а 
E - 
tu = [T ж 8 
ЧАД 
Thus if 4(/) can be expanded in an infinite Taylor series, that series must be X (uy ра 
Further, if this series does not converge, 4(t) cannot be expan 
series. But it can always be expanded in the finite form wi 


de | 
#0) = SP +R, 


ded as an infinite Taylor 
th remainder 


7=0 


Thus, when the series does not converge, ¢( 


asymptotically. 
This illustrates the sou 


d, 
Series х0) Hs 


!) can be expanded in powers of ¢ only 


For instance, 


all have the same 
2 7 б. P Eos m 
expansion. Tt is therefore hardly surprising that when X a or 5 Hr l, fail to converge, 
there may be more than one frequency or distribution function w 
moments. 


ith the same set of 
But it does not 


follow from what has been said that there must be more than one 
frequency-distribution т 


я those functions ma 


eristic function, for it does not obey 
the well-known condition that ¢(t) and Ф(— t) should be conjugate, So far as I am aware, 
ГЕК SG , 
it is not known whether the condition that re should converge is necessary ag 
well as sufficient for uniqueness, 


The Second Limit Theorem 


; * Cramér and Wold (1936, J. Lond. Math. Soc., 11, 290) have буй 
showing that if in a multivariate distribution i = що. + Bag JE 
is determined by its moments if 2'(Ag_)1/2m diverges, 


ended Carleman’s criterion b 
Hoots + ete., the distribution 


g 


w--' 


nm 7 a——- — aa 


THE SECOND LIMIT THEOREM 111 


We will first prove the rather more general theorem : If there is given a sequence of 
distribution functions Р, such that all moments of P, exist, and for any j the sequence 
ш) lies between fixed limits independent of n, then a subsequence Р, can be selected 
from Р, such that 


(1) lim [ xi dF, exists, = и, say. 


т—Э-0 
(2) The subsequence F, converges to some distribution function: Р. 


(3) | xi аР exists and is equal to ш. 


The existence of u; may be proved by the diagonal method exactly as for the Montel- ` 
Helly theorem of 4.10. By hypothesis, uj(4) is uniformly bounded and the rest of the 
Proof follows that of 4.10. 

The existence of F follows also from the Montel-Helly theorem. We apply the theorem 
to the subsequence derived by satisfying condition (1) and hence arrive at a subsequence 
obeying both (1) and (2). It must however be shown that F is a distribution function, 
Le. varies effectively from 0 to 1. This follows because 


| аар, « Hare a(n), be 
v Lo 0295) 
ЕЕ IF < Hoyo) 1 
ы аана а< — 
and hence, for the subsequence, with r = 0 and letting n’ tend to infinity, 
0«1—F() «15 


0 < F(a) <ith 


so that, as а, b tend to infinity the equations (со) = 1, F(— со) = 0 are seen to hold. 


We also require for later parts of the proof two results: the first that the convergence 
of 


я 4 
lim ағ, to [ аво РЕЧИ) 


а—— о, 0—50 Ја 


is uniform with respect to ». This follows from the hypothesis that jr. +)(%) is bounded 
and from the equations (4.49). The second is that 


Jim ae -F4)-0 Шш аР) 0 - . 5) 
for s > 0 and all integral n > 0. The first limit follows from 


1 — F,(b) - | ар, < | G аР, 
b b Nb 
ànd hence from 


b — F0) < I b> 0, 0-sc-9j 


oo 


We now have to complete the proof by showing that | а? аР exists and is equal 


=% 


112 CHARACTERISTIC FUNCTIONS 


to ш. For this we use the theorem (an extension by Fréchet and Shohat (1931) of one 
due to Helly) that if a sequence v,(z), defined in the interval — co to + co, is such that 
(1) v,(x) is of bounded variation in any finite interval, 
(2) all v,(z) and their total variations are bounded 
(3) lim v,(x) = v(x) 
п—Ээ-0 


of points, 


in any finite interval, 
exists for all x, except perhaps at a denumerable number 


(4) i f(x) dv,(z) converges uniformly with respect to n to | f(x) dv, (а) if f(x) is 
a 


everywhere continuous, 
then \ f(x) dv(z) exists and = lim f f(x) ао, (а). 
–о n—--J -o 


This result may 


be applied to our sequence F’,(x), which obe 
and (3). 


ys conditions (1), (2) 
It also obeys (4) when Ја) =a! in virtue of (4.50) and (4.51). Further F(x) 
is of bounded variation and hence | xi dF(x) exists and equals, say, uj. 
Finally 


\ — ща) = | Е (dF — аР,) | 


а а E | © 
< |j sdp| +] ар, +f ар 
-o E . b 
‚+ |] a) ПЕ, 
b 
By taking — a and b sufficiently large we can make the first four terms on 
small as we please, for — a < — 4, b> b, and some n > No. 


large we can make the fifth term as small as we please (without affecting the smallness 
of the other terms). Hence | 4 = (п) | may be made as small as we please. 

This establishes the more general result. The theorem enunciated at the beginning 
of the section follows as a corollary. In fact, if Hj») tends to a limit Hj, then the 
subsequence F, can always be selected and tends to a distribution function F with the 
moments и. All we have to prove is that if the и; ате such that they uniquely determine 
Р, the sequence F, itself converges to Р. 

Suppose that there exists а, 
to F(a). Then a subsequence Р, 
at a. But from this 


+ 


b | 

f s GP dE) & «0, bz 0. . (4.52) 
a 

the right as 

Then by taking n sufficiently 


n(%o) does not converge 
ges to some other value 
п” Converging say to F(a), having 


ese moments uniquely determine 
in all points of continuity, i.e y 
lim F(z.) = F(x). 
т'— o 
This is impossible, for Py) is a subsequence of Р (о) which converges, but not to 
F(a»). 


4.25. The above proof can hardly be deser 
simple notions such as continuity and convergence, but the Second Limit 
important that it has seemed worth while reproducing the proof in full, 
of its application will occur in the sequel. The chapter may be con 
illustration of its use in determining the limiting forms of distributions 


ibed as easy, though it depends only on 
Theorem is go 
Many examples 


cluded with an 


THE SECOND LIMIT THEOREM 113 
Example 4.8 


The discontinuous distribution whose frequency at ж —j(j—0,1,. . .) is е" F 
has a characteristic function > 
(t) = exp 21(е — 1), 
and hence all cumulants equal to m. 


еә (it) ity 
The distribution is evidently the only one with such cumulants, for = 2 = ms. 2 


is convergent and equals m(et — 1), so that the cumulative function and the characteristic 
o 
function are uniquely determined. 


mi 
Now as m tends to infinity the frequency at 2, е7" Л? tends to zero and thus the 


distribution does not tend to a limit. This is consistent with the behaviour of the 
cumulants, which increase without limit. 


Suppose, however, we express the distribution in standard measure. Then 
1 


7 
K, = — = 


r 


2 


Ka 

Hence as m —- оо all cumulants higher than the second tend to zero, and hence the 
cumulants of the distribution tend to those of the normal distribution 

1 


af = 
v (27) 


eia- vm* у, — oo « € « oo 


with the mean mi. 

Now we know that this distribution is completely determined by its moments 
(Example 4.7). Wealso know that the cumulants determine the moments and vice-versa, 
50 that if the cumulants of the discontinuous distribution tend to those of the normal 
distribution, the moments will tend to the moments of that distribution. Hence the 
Second Limit Theorem is applicable, and the discontinuous distribution does in fact tend 
to the normal form when expressed in standard measure. 


NOTES AND REFERENCES 


The idea of the characteristic function’ can be traced back as far as Laplace, but its 
introduction into the theory of statistics, through the theory of probability, is mainly due 
to Poincaré and Lévy (1925), whose book provides the most readable and complete account 
of the function. More recent researches are outlined by Cramér (1937). The proof of the 
First Limit Theorem is substantially that given by Lévy. The converse, given originally 


in a somewhat less general form by Lévy, was proved simultaneously by him and Cramér, 
the proof in 4.12 following the latter's. 

The Second Limit Theorem seems to have been first proved by Markoff for the case 
when the limiting form is the normal distribution dF = L e dx. It was subsequently 
Considered and extended b 


4/22 
y several writers, the general form of 4.24 beine d io Fré 
and Shohat (1931), у 1 à cae xat 
are given by these authors. 


hose proof has been closely followed. Some references to prior work 
The problem of moments 


A.S.— VOL. I. appears to have been first considered and solved by 
1 


114 CHARACTERISTIC FUNCTIONS 


Tchebycheff. The memoir by Stieltjes (1918—the memoir being 


first published in 1894) 
is classical. For some subsequent work see Hamburger (1920) 


and Carleman (1925). 


Carleman, T. (1925), Les fonctions quasi-analytiques, Gauthier-Villars, Paris. 


Cramér, H. (1937), Random Variables and Probability Distributions, Cambridge University 
Press. 


Fréchet, M., and Shohat, J. (1931), “ A Proof of the Generalised Second-Limit Theorem,” 
Trans. Am. Math. Soc., 33, 533. 

Hamburger, H. (1920, 1921), * Über eine Erweiterung des Stielt 
Math. Annalen, 81, 235, and 82, 120 and 168. 

Lévy, P. (1925), Calcul des probabilités, Gauthier-Villars, Paris. 

Stieltjes, J. (1918), Recherches sur les fractions continues, Œuvres, Groningen. 


jesschen Momentproblems,” 


EXERCISES 
4.1. Show that if a frequency function f(x) is symmetrical the characteristic function 
is an even function, i.e. Ф(0) = Ф(— 1), and that therefore ¢(t) is real; and conversely, 
if Ф(0) is real the frequency function, if any, is symmetrical, yp Г 


4.2. Show that the function 


eff — ]\n Eis 
g(t) = ( ü ) › ^ а positive integer, 


is the characteristic function of a distribution function 


F(x) = L (ie — 1" + (e -39.. +. 


| 
_ 4.3. Show that the factorial moment-generating function w(t) of the binomial (9 + р)" 
is (1 + pt)", and hence that 


— 


Hp = р" ml, 


4.4. Tf for a certain distribution t 
› к„ = ba’, : 
а and b being positive constants, 


show that the distribution is discontinuous with variate- 
values 0, a 


+ . and the frequency at ra equal to с. 


т! 
4.5. Show that the function е cannot be a characteristic function unless œ 9 


ye Ке ra у. 


4.6. Show that there is only one distribution with moments given by 
Го + n) i 
b T 


and that it is 
dF = 


1 
eT» dy 


T(r) Ocr« оо, 


EXERCISES 115 
4.7. A theorem due to Weierstrass states that any function continuous in the range 
' (а, b) can be represented by a uniformly convergent series of polynomials D> Pale), P(x) 


n=0 

being of degree n in x. Deduce that if two continuous frequency functions, f, and fa, 
have the same moments of all orders, 
b 
| (Л — fe)? dx = 0, 

a 
and hence that the moments determine a distribution uniquely if it is continuous and of 
finite range, 


4.8. If 0 is a non-negative function of the variate = and 
a(t) = f 0 dF(x), 


show that the frequency function of 0, if any, is given by 
fet. f 0-t-1a(t) dt. 


9zi 


. . 49. Show that if a characteristic function ф@) possesses derivatives up to and 
Including the second order, then 

4°ф 

(ж). 


(а). | 


4.10. А theorem of Denjoy (Comptes rendus, 1921, 173, 1399) states that if a function 
I(x) defined in a range (a, b) possesses derivatives of all orders, if M, is the maximum of 


and generalise this result. 


^ : TEE S " У 
Ife (x)| in the range and if Z— is divergent, then f(x) is completely determined by 
' Mx 
its value and that of its derivatives at a single point. Use this result to show that a set 
of moments determines a distribution uniquely if I diverges. 


ГАД 


CHAPTER 5 i 
STANDARD DISTRIBUTIONS—(1) 


5.1. There are certain distribution and frequency 
and practical reasons, occupy 
chapter we shall consider th 
and illustrated later in the 
in which they 
stage. 


functions which, for both theoretical 
a central position in statistical theory. In this and the next 
eir properties, leaving their statistical uses to be developed 
book. We shall, however, indicate briefly some of the ways 
arise, even at the expense of anticipating ideas introduced at a subsequent 
This will not impair the logical continuity of our development and will give con- 
creteness to a treatment which might otherwise appear somewhat abstract, 


The Binomial Distribution 


5.2. Suppose we have a large population of members each of which e 1 
some quality Р or a complementary quality Q (= not-P), for example, a population of I 
men who are either blue-eyed or not-blue-eyed. Suppose that the proportion of individuals ¥ f 
with quality P is p and that with quality Q is g, where of course D--g9-—1l. If-we tale í 
а random sample of N members from the population we expect that on the average pN 
members will exhibit P and Nq will exhibit Q. We may thus array the members according 
to the quality as 


xhibits either 


N(p +q). 


iduals. 
pairs for which the firs 
ich the second membe 
Q in the first member. 


Now suppose we choose N pairs of indiv 
and pairs QQ. Of the Np 
be a proportion p for wh 
for the Nq exhibiting 


There will be pairs PP, pairs PQ, pairs QP 
t member is P there will, on the average, 
r is P and q for which it is Q. Similarly 
Thus the pairs may be arrayed as 

Np(p +9) + Nap +q) 


=N(p + g*. 
Generally if we choose 


N sets of n the arra 
Proportion of cases containing j Р? 


У will be N(p + q». 


s and (n — j) Q's will be ( 


That is to say, tho 
|)», the term in pigr-i 


in (р + gy. We are then led to consider the binomial distribution | é 
ў 
Ј= (р + а" с z d : « (Блу 2m 
as a discontinuous frequency-distribution, the variate being the number of P’s in the SS 
of n, which may vary from n to 0, If, as is frequently more convenient, we wish to 
sider the variate as increasing from 0 


ently more con- ` 
to », the distribution is i 


nverted, і.е. becomes 
f= (0 + р)", о B . + (5.2) 
5.3. Distributions very close to the binomial form i partieularly in 
artificial experi s wi i g or dice-throwing, Some data by Weldon are 
shown in Table 5.1. Weldon threw 12 dice 26,306 times and noted the values at each 
throw. This is equivalent to the drawing of Samples of 12 ulation. The 
occurrence of a 5 or a 6 on any die was regarded as the ib ualit; Pix 
* success " as we may call it. a 


116 


THE BINOMIAL DISTRIBUTION 117 


TABLE 5.1 


Frequency-distribution of 26,306 Throws of 12 Dice, the Occurrence of a 5 or 6 being 
counted a Success. 


"Theoretical Frequency Theoretical Frequency 
No. of Observed | from the Binomial No. of Observed from the Binomial 
Successes. | Frequency. 26,306 Successes. | Frequency. 26,306 
(0:6623 + 0-3377)?* (0-6623 + 0-3377)1* 
| 
0 185 187 6 3,067 3,043 
1 1,149 j 1,146 7 1,331 1,330 
2 3,265 3,215 8 403 424 
3 5,475 5,465 9 105 96 
4 6,114 6,269 10 апа оуег 18 16 
5 5,194 5,115 
| 2 
TOTAL 26,306 26,306 


If the dice were perfect (a condition rarely realised in practice) the proportion p of 
Successes would be 1; and the appropriate binomial would be, in the form (5.2), G +e. 
In this particular case the dice were not quite perfect, the proportion of cases exhibiting 
а 5 or a 6 being 0:3377. Taking this as the value of p, we get the frequency function 
(0-6623 + 0-3377):2, which when multiplied by the total frequency 26,306 gives the 
theoretical frequencies shown in the third column of Table 5.1. The agreement with 
Observation is evidently fairly good. 


5.4. Taking our variate to be increasing, we have, from (5.2), that the frequency at 
WAL 
Msc Jg (Pew. The characteristic function of the distribution is then 


во = ("ее 


j=0 
= (q + pelt)” $47 s EM e Л eti) 
We then have for the moment of order j about the origin, from (3.11), 


"eee ; 
шщ = Е + pe] 


1 = np | 
us = np + n(n — yel 3 $ : Р ‚© (54) 


апа hence 


and so on. We find 


На = трд . . . < . (5.5) 
из = npq(q — p) Я : f б . (5.6) 
Es Ha = 3n?p?q? + pqn(1 6pg) . T «+ (5.7) 
cda cm рр 
y id o^ By . s 3 * . (5.8) 
_ Ma = 375? 1 — G6 
See Pog, 3, (59) 


Ша? "pq 


- 118 STANDARD DISTRIBUTIONS 


75.5. Further formulae are not often г 
from some interesting recurrence relations с 
Writing 6 = it we have, for the charact 


equired, but when they are can be derived 
onnecting the moments of the binomial. 
eristic function referred to the mean as origin, 

Ф@) = e-""(q 4 реу, a H А à . (5.10) 
Differentiating with respect to 0 we find 


2 4. Hi-1 
= Dj — — "pe ""q + ре)" + ne-"P(g + pe) pe 
j-1 а 
= © ; 0 o ; 
uj ‚. тре nm 
2, J! Uo CEN) 


and hence, after a little re-arrangement, 


a ут 40/71 і оаа y шб! _ 0. 
(4 + pe Y 2,6 pj "eu у >, Л 
Identifying coefficients in 9-1 we get 


a —1 Ar —1 
һ=" (7 )»—»5( j Jenis Е ОЕ) 
7—0 j-0 


giving the moment of order r about the mean in terms of tho 
Furthermore, writing the moment about the mean as 


SEEN inf 
ГА 2, "ay (jen p 


we have, differentiating with res 


du, - гъ () З 
= — rnd(j—np)r-1 . ]gn-ipi — {ле т 
T (j—np) }/ P — 20) —np) 


The first term on th 
UAR: т 
Э FRL s 
7 (j — np) ( 2) 


se of lower orders, 


pect to p, 


T T1 : | 
( jme D+ x(j— "гум. 
J 
The sum of the other two will b 


e right is — TRU, 4. 
mj, 1 
gip = рү Hence we find 


е found to be 


du, D 
dp | (519) 
= 0, па = npg = ^p(Y — р) and hence 
Hs = pq(n — 2np } 
=n = 
as stated in (5.6). PUY — p) 


For factorial moments about the origin the expressi i 
ons ч Ч 
form. In fact, differentiating (9 + py r ti А Du M y ety, 


Hye = pira... RE 
For example, Ita 


r ti ia simple 
by р" we have mes partially with respect to p and multiplying 
ng + ppr — P») (Siret 
j=r 
" es Hin 5 
апа hence since I+p=1 


Mey = npr. 


A ee 
NP 


THE BINOMIAL DISTRIBUTION 119 


5.6. If p =q the binomial distribution is obviously symmetrical. If p = the 
distribution is skew. But in both cases it will be unimodal unless pn is small. For the 
frequency of the (r + 1)th term is greater than that of the rth so long as : 


^ n-r-lyrtl | n=ryr 
( tem» De 


n! (n —r —1)\(r +1)! Up 
<= 
(n — r)t! n! q 
r--1.9p 
Nc 
n—r q 


or 


or 


which is equivalent to 
rd 
n+l 
Hence the frequency increases until the point when (r + 1) > p(n + 1) and then declines 
again. Some typical distributions are shown in Table 5.2. 


<p. 


TABLE 5.2 


Terms of the Binomial Distribution 10,000 (q + p) for Values of р from 0-1 to 0-5. 
(Figures given to the nearest unit.) 


Number of р = 0-1 р = 0:2 р = 0:3 р = 0:4 р = 0:5 
Successes. q = 0:9 а = 0:8 9 = 07 9 = 0:6 9 = 0:5 
0 1216 115 8 — — 1 
1 2702 576 68 5 — 
2 2852 1369 278 31 2 
3 1901 2054 716 123 11 
4 898 2182 1304 350 46 
5 319 1746 1789 746 148 
6 89 1091 1916 1244 370 
7 20 545 1643 1659 739 
8 4 222 1144 1797 1201 
9 1 74 654 1597 1602 
10 Е: 20 308 1171 1762 
1 >, 5 120 110 1602 
12 — 1 39 355 1201 
13 = = 10 146 739 
l4 = = 2 49 370 
15 == sa > | 13 148 
18 = = == 3 46 
18 E ex I ae E: 
19 | = = | = 2 E] 
20 = = ES T A 


5.7. The ordinates of the binomial are most directly calculated from the formula 
Nu. AS : 
( ie p; for low values of n the calculation is straightforward and for high values 


assist ri 
ssistance can be derived from tables of log n! The calculation of the distribution function, 


н 
120 STANDARD DISTRIBUTIONS 


which is equivalent to the summation of terms of the binomial, is tedious to perform directly, 
but use may be made of the tables of the incomplete B-function. We have, for Taylor's 
series with the integral form of remainder 


Tr (i ra — t, 
Кал 2, iia) +], Gai e+ thd. e a (5.18) 


Putting a =q, h =p and f(a +h) = (9 + p) 


^ we have 
red т 
СЕ Y (Dew + Р, | 
і=0 


where Р, is the remainder after r terms and equals 


—(rü-tu- ы ү age 
dac f. EDO о == 7 + pty-: dt 5 Р . (5.14) 


In (5.14). put ¢=1 = We find 
n! Ру 25 Ууф 
Р, = c= Naa al,” 1 (1—2) "-r dy 
ГЕ T(n +1) кы 
LOL т)" -ren 


— Bir, 1 — 7 + 1) 
E B(r,n —r +1) 


B 


=1,(r,n ғ + 1) 


тол IS 
in the usual notation, This is also equal to 


= (% =) 


. 3 а - (5.10) 
property of the B-function, 


The remainder after y — 1 terms is, similarly, 
rth term is 


by a well-known 


der a ay ш + 2), and hence the 
EIN OU Le in rig] | 
=1(n —r + l,r) — In —r + 2,7 — 1). 


A EH, A 
Example 5.1 ` M 
When n = 20, r = 11 


› P = 0-4 we have for the remai 
which from the tables is found to be 0:127,521,2. The SR ADR in e E 
terms in the appropriate column of Table 5.2 is 0.197 6, the i ast A Е 
rounding up. Тһе remainder after 12 terms ; ; 526 ^s "The de 2 
(10 “ suécesses ") is then the difference of these two remainders 2 0-0710 as shi а 
Table 5.2 for the frequency per 10,000 of 11 Successes, күн 


The Poisson Distribution 


5.8. Cases sometime. 


8 occur in which the 
tion is very small. Wem 


proportion pof“ 
ay suppose our numb 


Successes ” 
er n large enough to render np itself appreci- 


in the Popula- | 


THE POISSON DISTRIBUTION 121 


able though p is small; and we are thus led to consider the limiting form of the binomial 
‚ (5.2) as p — 0 subject to the condition that np remains finite, and equal to 4, say. 
Under these conditions the term 


n n! AT ANE 
rg?" = JT 
(bv Ei (n — r)tr! an 2) 


К A/(22)e7" nett И a -a 
\/(2л)(% 25, rje rti e "rar 
E X i nd 


Thus the terms of the binomial become the successive terms 
2 2i 
27 5 
€ (2-2-5) . . . ‚ . (5.18) 


This is called the Poisson distribution, having been given first by Poisson in 1837. It 
has since been discovered independently by several other writers. 
From the point of view of characteristic functions we have 


д) = lim (q + ре)" 
ron 
A n 
— lim fı + -(e — n} 
n—> 2 n 
= exp A(é! — 1) 4 : я . A . (5.19) 


Which is readily verified to result in the distribution (5.18). 
Thus 


p(t) = Ae! — 1) = ash 


and hence all cumulants of the Poisson distribution are equal to 4. We thus find 


Mm =A 
Mam ds 
dg mA . . . . (5.20) 


pa =A + 322 
If we let n—> œin (5.11) and (5.12) we find 


Mr —1 
„= S B s n " > Э . (5.21) 
7=0 


Adu, 
Gh. ` j à = 


& E 
and Brea = "А-1 + 


: Алї ; 
5.9. "Tables of the function е T for various values of 2 and r are given in Tables 


IE I eat Sat and Biometricians, Part I. The frequency polygons are very skew, almost 
"shaped for low values of 2, but become nearer to unimodal symmetry as Л increases 
: ^ AT +1 : 
A comparison of the successive terms zi and c shows that the frequency increases 
ies ў 4 ! ! 
р to the point for which r + 1 <A and then decreases again. 


122 STANDARD DISTRIBUTIONS 


The summation of r terms in the Poisson distribution may be carried out in a manner 
similar to that of 5.7. The remainder after r terms of the distribution is found to be, 
from (5.13) 


EY cn — f-1 
2, = [е (1 — ty-1 qt 


_ Dy) 
T(r) 
С) СЕ 2 е d . (5.23) 
vi 
in the notation of Pearson's tables of the Incomplete I-function. The argument used 
in these tables is a difficu 


It one to work with in the present case and, tl 


ough formula (5.23) 
r summing a number of terms in the Poisson distrib 


may be used fo: ution, it is easier to 


т 
F directly rather than to use an analogous expression to (5.17) in the form 


rth term = (em — 2) = (Sor — 1), 


5.10. We now consider a generalisation of the bino: 
in 5.2 our approach was based on the drawing of set: 
Suppose, however, we draw them from n different p 


calculate e-^ 


mial and the Poisson distributions, 
8 of n from the same population. 
opulations with proportions 

1) (Pe, ge)... (D. q,). 


Then our proportional frequencies will be arrayed by the form 


(i ds +4)... (p, +a) = Å (o; + 4) ; 
n 


Lm USO 
Which of course reduces to the binomial if all the p's are equal, 
The characteristic function of this distribution is 
H(t) = Tg; + руей) 
from which we have 
000) = Z log (1, + руей) 
A it)® 
= Flog {1+ pit + pf ү ed f 
> 
à Y 
= (бдр + re, — v2) +, ote, " 
giving Hi = Xp; ) 
ка = pa = Уру Ы 9 . . + (5.25) 
Writing now p for the асц of the р?з in the different Populations, we have 2 
а = тр 
Из = Xpq = Ур — Ура 


1 1 
= Ур - 2 * .— (yu 
p —x C») 5 5 Up) } 
= пр — пр? —n var р 


(where var р is written for the variance of p) 


= 000 — n varp 


me 


Se 


THE POISSON DISTRIBUTION 123 


A comparison of these results with those for the binomial shows that the variance 
of the distribution (5.24) is less than that of the binomial with the same average p by an 
amount equal to n times the variance of p. 

Similarly we see that, for the Poisson distribution in such a case 


и, =A, the mean of the A's . . . - (6:27) 
1 
and ds —À— = таг (np) 


=A to order n^. 
The Poisson form thus holds for (5.24) notwithstanding the inequality of the p’s, provided 
that the variance of А is small compared with n, which will be so if all the р” are small. 


5.11. Consider now the case when successive sets of n are drawn from different popula- 
tions characterised Ьу рл, рг - - · Pyy In the previous case we supposed any set of n obtained 
by taking one from each of » populations. 

We now suppose that any set is drawn from one population only, but that different 
sets come from different populations. Our array of frequencies will now be 


k 
XE ds. is PRONDETELD 
j=1 


al m 


and evidently the moments of this array are the sums of the moments of the (q + р)", 
that is to say, from (5.4), 


, 1 
га. = yp 
h = 7 np + n(n — 1)p*}. 
Writing р for the mean of the p’s as before, we have 


ш = пр н 
fa =%ў + Zann — 1)р? — n?p* 


= npg + iun — 1)p? — n(n — 1)p? 


= прі + n(n — 20 =< 2! 
= np] + n(n — 1) var p. ў . . . . (5.29) 


In this case the variance is greater than what it would be if the distribution were of the 
ordinary binomial type by an amount n(n — 1) var p. 
For the Poisson distribution we have, on taking limits, 


=A 
Hs =A + хаг] ` ' у . + (5.30) 
and here also the variance of the distribution is affected. 


of 5.12. The results of the two preceding sections enable us to discuss the occurrence 
the binomial and the Poisson distributions in practice. An example has already been 


124 STANDARD DISTRIBUTIONS 


given in Table 5.1 of a distribution conforming to the simple binomial type. It is not 
easy to find material compiled outside the laboratory which does so. 
For example, suppose we re: > | : ta i 

a number of samples of n from the population of the United Kingdom in different localities. ` 
We should probably find that the proportions in these samples did not conform to the 

simple binomial form. The variance трд calculated from the known n and observed p would 

probably turn out to be too small. Ifso we should conclude from (5.26) that the proportion 

p varied from place to place in the population, the deficiency in the variance of the propor- 

tions observed being due to the variance of p itself in the sections of the population from 

which the samples were chosen. We are assuming for the time being that these differences 


are not explicable on the basis of sampling fluctuation alone; but a full discussion will 
have to wait until later chapters. 


б. 
gard the possession of blue eyes as a success, and take cM | 


5.13. The same effect is found in distributions which at 


first sight might be expected 
to be of the Poisson type. 


For example, suicide is a rare event and it might be supposed | 
that if we took a series of large samples, say the population of the United Kingdom in 
Successive years, the frequencies of suicides would follow the Poisson distribution. This, 
however, is not necessarily so, for all members of the population 

to risk and the temptation to suicide may vary from year to 
years of trade depression. This inequality of risk is typical of one field in which the Poisson 

distribution has been freely applied, namely, | 


industrial accidents. Table 5.3 shows, in 
the second column, the frequency of accidents occurring to women w 
facture of shells. 


The Poisson frequencies shown in the third col 
poor fit. The reason is that the liability of individuals to acciden 


are not equally exposed | 
year, e.g. being greater in SOY 


orking on the manu- 
umn provide a very 
t varies. 


TABLE 5.3 

Accidents to 647 women working on H.E. shells in 5 weeks, 

(Greenwood and Yule (1920), J. Roy. Statist. Soc., 83, 255.) 

Number of Observed Poisson Distribution Distribution given 

Accidents. Frequency, with same Mean. by (5.33). 
0 447 406 442 E 
1 132 189 140 ( 
2 42 45 45 Il 
3 21 7 14 
4 3 1 5 
Fy 5 po {2 

TOTALS 647 


648 648 


As a working hypothesis (cf. Greenwo 
is composed of individuals with different 
different values of 4 in a Poisson distribu 
distribution of д is given by 


od and Yule, 1920) 
degrees of acciden 


tion; and suppose 


| 

Suppose that the Population 
t proneness, represented by | 
that in the Population the \ 
| 

{ 

i 


cP 
dps =c} =1 
Tp) sap: 5769. 


. . + (5.31) 
There are theoretical reasons. justifying this supposition, 


) 


THE POISSON DISTRIBUTION 125 
The frequency of j successes is then 


7? P "oe yi 
ела тед) 
| Г(р) j! 


or the coefficient of 8 in 


CR ee 
=) eneyp=le MEM A. 
na). 


which, on the substitution of (c + 1 — 02 = u, becomes 


Eu ERE Cou 


The frequency of 0, 1, 2, . . . successes is therefore 


Cry e жер ОКШ 


s (с ар р p(p + 1) \ 
a= (a) ute ES te 


E c Ww p ct 
(4) Natal Nu ce 
p 
с 


The mean is thus 


Be ae a) 


Similarly [a= pete’), so that 
ju EH m m ОРНА I (5) 
e SUD 


If we now put the observed mean and variance of Table 5.3 equal to the values of (5.34) 
and (5.35) we have two equations which can be solved for p and с. The distribution (5.33) 
can then be found. The frequencies are given in the fourth column of Table 5.3 and evidently 
give a much better agreement with the facts. 


5.14. The interesting feature of the distribution (5.32) is that it is a binomial with 
negative index. In the approach adopted in 5.2 the index is necessarily positive; but it 
is often found that observational materials are represented by negatively indexed binomials. 
Yule (1910) * has given an illustration of this effect which does not depend on any arbitrary 
assumption about distributions such as that embodied in (5.31). Suppose, in fact, that 
we have a population subjected to recurring attacks of a disease, that r attacks are fatal 
ànd that on the average one attack is fatal to a proportion p of individuals at risk, the 
actual numbers succumbing varying as if the population were chosen at random from a 
larger population in which the proportion of survivors is p. Consider the proportion of 
individuals surviving 0, 1, . . . attacks at the nth exposure. Evidently this is the propor- 
tion of successes in samples of» when the chance of success is р, i.e. (9 + p)”. The proportion 
of survivors at the end of n exposures will be the sum of the first r terms in this series. 


* J. Roy. Statist. Soc., 73, 26. 


126 STANDARD DISTRIBUTIONS 


The proportion of survivors at the end of ( 


n — 1) exposures will be the sum of the first 
T terms in (q + p)""1. Consequently the 


proportion dying during the nth exposure is 
the difference, 
r—1 fem 

n n—jyi + n ye 
Ore MC; enm 
j-0 7=0 
r—I 

DAC = а: + = jn = a т, yeh 

j=0 J 1 = J 
el . 
X6 -)ev- C5 ene 
fM I I 
Ls = "рт 
mir mA CP 


Thus, since death does not commence till tbe rth exposure, for the values of n from 
r onwards we have the proportion of deaths 


y, rq, € Des ed 


Dur 


EUN EO 
le. successive terms in р" 


(1 — 97", a binomial with negative index, 
has been found to opera 


te in experiments on the killing of bacteria by disinfectants, 


The Hypergeometric Distribution 


5.15. Consider now the generalisation of the 
drawn from a population of 
a sample which contains ғ 


approach of 5.2 when sam 
N individuals, where N is not necessarily large, 
P's and n — r Q's, it can arise in . 


"NNp(Np —1)... (Np — т + 1)Nq(Nq —1).. 
ү N(N —1...(N —5 31) 
s (SH Dime 


ples of n are 
If we take 


: (Nq — 
Qn r1) 


T NUI S 4 


ways. For there are (а ways of selecting the sample, and th 


ет P’s can be ch 
Np А 
r ] Ways, and the n — 7 Q’s in 


Osen in 
ways, the expression given in (5.37) 


being 
Np Nq N 
equal to ( е ) Я че (С) 
Hence we are led to consider the discontinuous distribution 
Wis % 
{== {( NaN rp 

жид, 7 4) E 5 - (5.38) 

a form in which the an 


alogy with the binomial (5.1) is evident, 
(5.38) approaches the binomial, — Dep CEST form 


A law of this kind 


+ (5.37) - 


THE HYPERGEOMETRIC DISTRIBUTION И? 


fle) = (торла). VEM CERES 29) 


(Nq) х (Np) ni , aj 
NUI (Nq ENT + 7/4 j! 
that is to say, to the hypergeometric function 


The series 


is equal to 


(Vor К 
| үм F(a, В; у, х) 
if г a=—n, f Np y2Nq—n-cl . . . (5.40) 


The distribution (5.38) is therefore called hypergeometric. We have 
«Ва | alx + 1)В(8 + 1) e? 
BNP ag) ag ORT a fuot 
and it is well known that this function satisfies the differential equation 
at dires п +} — apr = 0, SEE sc 
a fact which may be readily verified from the equation itself. 
If in (5.39) we put ж = е (0 = it) we evidently have the characteristic function of 


the distribution. On making this substitution in (5.41) we find, after some reduction and 
replacement of the values of a, B, y by those of (5.40), 


a(l — x) 


(1 — e. (20% — (n + Ур) ү + nNpé — Nnp$ + NI = . . (5.42) 

Since ф = X ght we find, from the coefficient of 0° in this expression, 
— Nnp + Nu, —0 
ш = пр : 5 2 : 5 + (5.43) 

the same result as for the binomial. The mean of the hypergeometric series is independent 
of N. 

Taking now the distribution about its mean, and hence substituting e""?d for ¢ in 
(5.42), we find 


(1 — e) E і А q) — Np) + (QN = пт |+ x. маё =0 . (5.44) 


whence, identifying coefficients in 0, 0?, 0? we find 


tN — 
m = È m ЖЕК a W AM Re oe vw © Les 
— "pag — pN x ET 2n) Š 
n iN DN 2 А i i E 3 а б . (5.46) 
QApq(N — n) 


= Way aN 3) A NS y—n)--3pq (N*(n —2) —Nn? +6n(N—n)}] (5. 47) 


and generally, if Е denotes the operation of raising the order of a moment by unity, i.e. 
Hr = „уу, we have 


Миы = (1 + Еу — Еи. {р + ala — р)) + (npa(N — n)a) . (5.48) 
As we expect, when N —> oo these values tend to those of the binomial. 


128 1 STANDARD DISTRIBUTIONS 


5.16. An example of the occurrence of the hypergeometric Series in practice is given 
in Table 5.4, giving the frequency of occurrence of cards of a certain suit in hands of whist, 
Here N is the number of cards in the 


pack, 52, and n = 13, p=1 
series is thus 
1 x(13 13U139t»- 
БЕЛЕ N j ч 


giving the frequencies shown in the third column, 


This agreement appe 
good. 


ars to be reasonably 


TABLE 5.4 


Distribution of 3400 First Hands at Whist according to Number of Trumps in the Hand. 
(K. Pearson, 1924, Biometrika, 16, 172.) 


P E | 
Number of ^ Frequency of Number of . 
Cards in Mint Hypergeometric Cards in DES re г 
г ў E "eem |; wie "requeney, Ed 
the Hand. Distribution the Hand. | 1 У Distribution, 
—— 
0 35 43-5 5 444 9, 
А 1 290 2723 6 115 ips 
2 696 700-0 7 21 $ 
3 937 973-5 8 n ур 
4 851 811-3 9 and over 0 te | 
| 
Torats 3400 3400 


ay be used to 
i -curve of type 
Оли О О Л ар 0<2 <1 

to the distribution and obtainin: 


в areas of that curve from the B-tables 
method and an example are give: ы 


n in the preface to the Tables of the Incompl Details of the 


ete B-function, 
The Normal Distribution 


T standar minis ae Lus 
Р = е dy ло «а oy = 
2л 70) 
The slightly more general form 
1 aa(t—p')? 
M ТАМ Be а qu m ad (5.50) 
is known as the normal distribution. It is the most im | 


statistics. The expression (5.49) is of 


25 athe appropriate | m j 


THE NORMAL DISTRIBUTION ` 129 


We have for the characteristic function of (5.50) 


LIESS "" 
SOLE M ME M T s s (GU) 
ivi _ (2)! s. 
giving а | | n (5.52) 
Hare. = 0 
. — 126° 
апа Маи ле og fti 


so that 


Ke = fa = g? 55 
NE ess 1 К = К 5 . (5.54) 
We also have f, = 3, y, = 0, which accounts for the standard adopted for mesokurtosis 
in 3.32. 
5.19. The distribution function of the normal distribution expressed in standard 
measure is the integral 
т gt 
F(x) = m f e х. . 2 : ‹ „ (5:55) 
D л) Be 


The integrand may be expanded and we have 


F(x) = ^ + eT dz 


1 ur 
VG) | 
Р 1 bd x? l/a?42 = s 
—3 + (m) КГ 2 tai) X je 

1 


1 a? l 2 
(e -Atja 55 ete.) (6.56) 


This converges too slowly to be of use for other than small values of z. 
If x is large an asymptotic series may be employed. We have 


1 — F(x eT dæ 


a |, 


and on repeating the partial integration, 


ез 1 3 3.5 
= wy (2m) {1 ж? RET pe "oW + $ n, . . (5.57) 


Where R, is less in absolute value than the last term taken into account. 
The most useful formula, however, is a continued fraction due to Laplace. Put 


"P 00 TEM VENE 
e(t) EXT HL zo o eH * + 


A.S.—VOL., I. 


130 ; STANDARD DISTRIBUTIONS 


so that «(0) is the expression in curly brackets in (5.57). We then have 
l xw 1 3 
7 z2% x" —1?  s1-— |) 
= — {(1 — t) a(t) —1) 
Hence if a(t) = Ху; we have, identifying coefficients in t", 


SEO со 


л ә 
тз ОЕ Merit + Yr — Ya = . . . А . (5.58) 
1, 1 
H ЕЛА on PNE MEN 
Sr Yr-1 i „баг Yr+1 
22 у, 
4 x x 2 3 Ys 
Thus Zt = = — 
„ун EY Y mg, 
LYy 
Sex 2 3 n 
eta 24 qp tuis 


Now when / = 0, « reduces to yo, and we also see from (5.58) that as x —> 00, Yı 


=. Hence 
we have, from (5.57), 


c x 1 2 
1- Fe) = DEI ЖЕРЕ ОЛ j 


EFE т, edu? 
Чоо eee rae joo ово) 

The continued fraction thus gives the ratio of the frequency of the normal distributio 
to the right of the point x (the “ tail area ") to the ordinate of the frequency-distribution 
at that point. n 

This expression was in fact used by Sheppard (1939, posthumous) in caleulating hi 
superb tables of the normal function. These tables give, among other things ;— cree? 

(a) The ratio of the tail area to the bounding ordinate, to 12 places of decimals for 

intervals 0-01. 7 

(b) The same to 24 places for intervals 0-1. 

(с) The negative natural logarithm of the tail area, to si 

Tables which are sufficientfor all ordinary purposes will be 
and Biometricians, Parts I and II. At the end of this volu 
will suffice to illustrate the theory and examples given her 


xteen places, by intervals 0-1, 
foundin Tables for Statisticians 


me we give some tables which 
еп, | 


5.20. The shape of the normal curve 
is illustrated in Fig. 5.1. Tt is symmetrical and unlimited in 


very rapidly as the variate increases. "There are points of infle 
either side of the mean. Р 


range, falling off to zero” 
ction at unit distance ор 


q" 


THE NORMAL DISTRIBUTION 131 


For the mean deviation we have 


5 с оге = 
| POLENA] хе ? dz 
Эл Ju Ao 1 
2 
= NER O-79788. 5...  « o (0:60) 
л 


The variance is of course unity, because the distribution is expressed in standard measure 
The quartiles are distant 0-674,489,75 from the mean, as may be found from the Tables 


- 


OL 


E eS =2 a о 1 2 3 4 


Fic. 5.1.—The Normal Curve y = е— 4л, 


1 
у (27, 


5.21. Аз an illustration of the occurrence in practice of a distribution which is very 
close to the normal, the height data of Table 1.7 may be taken. Table 5.5 shows the 
actual frequencies and those given by the normal curve with the same mean and standard 
deviation (67-46 and 2-56 inches respectively). 

The correspondence is evidently fairly good. It must, however, be noted that whereas 
the theoretical distribution has infinite range, the practical distribution’ has not, since it 
1s impossible to have a negative height. In this particular case the relative frequency 
of the normal distribution outside the range 57—77 inches is so small that the point is 
unimportant; but when distributions of finite range are represented by those of infinite 
Tange it is as well to remember that the fit near the tails may not be very close. 


5.22. The normal distribution has had a curious history. It was first discovered 
by De Moivre in 1753 as the limiting form of the binomial, but was apparently forgotten 
and rediscovered later in the eighteenth century by workers engaged in investigating the 
theory of probability and the theory of errors. The discovery that errors of observation 


132 STANDARD DISTRIBUTIONS : 
TABLE 5.5 
Frequency-Distribution of 8585 Men according to Hewht (Table 1.7) compared with Theoretical 73 


5 
Frequencies of a Normal Distribution with the Same Mean and Variance. if 
] | | 
Height Observed | Theoretical Height Observed Theoretical 
(inches). Frequency. | Frequency. (inches). Frequency. Frequency. | 
= | 
57- 2 1 68- 1230 | 1234 
58- 4 3 69- 1063 989 
59- l4 11 70- 646 682 
60— ES! 3 71- 392 405 
61- | 83 88 122 202 207 | 
62— 169 £00 73- 79 91 
63- 394 395 T4- 32 34 
64— 669 669 75- 16 11 
65— 990 976 TÖ- 5 3 
66- | 1223 1227 77- 2 | 1 
67— 1329 1326 | 
EN Torars 8585 8586 1 


ought, on certain plausible hypotheses, to be distributed normall 
that they were so distributed. The belief extended itself to dis 
of height, in which the variate-value of an 
of a large number of small effects. 
It was found in the latter half of t 


y led to a general belief 
tributions such as those 
individual may be regar 
Vestiges of this dogma are st 
he nineteenth century that the 


he normal type and it seemed 


5.23. Since the normal distribution m. of the binomial 

it is natural to inquire into the limiting forms, if any, of the hypergeomet; 
| 
! 
П 
| 


ay be considered аз the limit 


rie distributi 
From (5.38) we see that the difference betw e (ric, 
: is 
1 m! 


NW» 7m ry (pA 8р X c e EET 


een two successive terms in th 


r+ Ж == ру 
1 n! 


: um nn Nap — Мут 

“HOT qs у СҮР)” Nay 1 T + Ny = be uan ?j 
th term is then 

Zu, — A + Br 

Ye QDr + Er? 


The ratio of this difference to the (r + 1) 


BS 


THE BIVARIATE BINOMIAL DISTRIBUTION 133 


where the quantities A . . . 2 are constants. In the limit when the distribution is 
expressed in standard measure, Ay, is the increment when r.increases by a small quantity, 
and we are thus led to consider the differential equation defining a frequency function 


d А+ Вх 

f С+рх + Ех? 
This is the equation of a family of functions—the Pearson distributions—which will be 
considered from a slightly different standpoint in the next chapter. 


da e ОЛЕ ГАШ о 


The Bivariate Binomial Distribution 

5.24. In generalisation of the results of 5.2, consider the drawing of samples of n 
from a population the individuals of which may or may not have two attributes, P and 
not-P (= Q) and R and not-R (= 5). Suppose that the proportions of the individuals 
with attributes PR, QR, PS and QS are a, b, c and d respectively, where a +b 4- c + d = 1. 
In exactly the same way as for the binomial ease it is seen that the proportion of samples 


! 
with i PR’s j QR’s, k PS's and 1 QS's is m E 


aibickd! and the distribution of samples is 


given by the multinomial form 
fH=(@@+tb+et+ qr s 5 Р . (5.62) 

The distribution given by this form is bivariate, one variate being the number of P’s and 
the other the number of R’s. The characteristic function of the distribution is 

ф = (ae + bells + сё + dy. 5 : . (8.63) 
We have then 
i 
plog g = logfa +0 + e +d + ай +) + bit + cit, — ун + ta)? —2e-ga+.. 3) 
а + с , a+b 
<a g 


2 
1 


= log + 4(a + 0) + ila + c); — Batt, +. . j . (5.64) 


From this it is seen that the mean of the variate corresponding to the occurrence of P's 
is n(a + с), and that of the variate corresponding to the occurrence of the R’s, n(a + b). 
From the terms іп / and #3 in the expansion of (5.64) we find also that the variances are 


nia + c1 — a + c) and n(a + 0)(1 — a + b). If we now transfer the origin to the mean 
of the variates we have 


log ¢ =F (Ga оа ао) + (a + b) — a +b) + 24а a + ca + 0)) + O(n). 


Thus when the distribution is expressed in standard measure and » allowed to tend 
to infinity the characteristic function tends to the form 
log $ = — Mt + & + phita) s 09 6 (51065 
Where = а — (a + с)(а + b) 4 
(а + c)(l — a + e) + byl —a + b)}¥ 
This, as was seen in Example 3.15, is the characteristic function of the bivariate form 
ФЕ = 


1 1 2 T 2 M TERT Я 
рео (арена ење 0 
гад the multinomial form (5.62) tends to the form (5.66), which may be regarded as the 

lvariate analogue of the normal distribution. 


134 STANDARD DISTRIBUTIONS 


If the two attributes P and R are independent in the population; that is to say, the 
proportion of P's among R’s is the same as among the not-R’s, we have 


a с 


a+b c+d 


and hence 


a+c ur c _a+e 

@+bictd a+b les \ 
so that а — (а + b)(a +c) = 0. Thus р in equation (5.65) vanishes. In this case and 
only in this case the distribution (5.66) becomes 


(He = ЁЛ» e~it ах; 


1 
v) v, ^ 
Le. x, and x, are independent variables. This is what we sh 
necessary if our use of the word “ independent ^ 
distributions is to be consistent. 


702 dz, 
ould expect and, indeed, is 


in relation to attributes and frequency- 


NOTES AND REFERENCES 
For further formulae about the constants of the binomial di 


stribution, including the 

раа 

incomplete moments 2 ew. see Frisch (1926) and Romanovsky (1925). Some of 
1-0 


the results are given as exercises below. See also Haldane (1939). 
the hypergeometric distribution see K. Pearson ( 


For the formulae of 
functions of the binomial and the hypergeometri 


1895 and 1924). On the distribution 
с, see Camp (1924 and 1925). 
On Poisson’s distribution reference may be made to Whitaker (1914), * Student ” 
(1907 and 1919) and Morant (1921). 

Camp, B. H. (1924), * Probability integrals for the point binomial,” Biometrika, 16, 163. 
(1925), “ Probability integrals for the hypergeometrical series,” Biometrika, 17, 61. 
Frisch, R. (1926), see refs, to Chapter 3. 
Haldane, J. B. S. (1939), “ The cumulants and moments of the binomial,’ 
31, 392. 


Morant, G. (1921), “ On random occurrences in Space and time when followed by a closed 
interval,” Biometrika, 13, 309. 
- Pearson, K. (1895), “ Skew variation i 


ч Biometrika, 


n homogeneous material,” Phil. Trans. A., 186, 343. 

(19242), * On the moments of the hypergeometrical series,” Biometrika, 16, 157. 

—— (1924b), “On a certain double hypergeometrical series and its Tepresentation by 
continuous frequency surfaces,” Biometrika, 16, 172, 

Romanovsky, V. (1925), * On the moments of the hypergeometrical series,’ 
17, 57. , 


Sheppard, W. F. ( 1939), The Probability Integral, British Association Mathematical Tables 
vol. 7, Cambridge University Press. ў 

Soper, Н. E. (1929), Frequency Arrays, Cambridge University Press. 

“Student ” (1907), “ On the error of counting with a haemacytometer,” 

—— (1919), “ An explanation of deviations from Poisson’s law in pra 
12, 211. 

Whitaker, L. (1914), “ 


* Biometrika, 


Biometrika, 5, 351. 
сийсе,” Biometrika, 


On Poisson’s law of small numbers,” Biometrika, 10, 36 


| 
ж 


135 


EXERCISES 
EXERCISES: 
р? 5.1. Show that for the binomial distribution (9 + p) 
/ dk, 
2 a= ae 
кез = P т> 1. 
Hence, writing c = pq, 9 =P — Ф that the cumulants are n times the following values— 
Kg = €; кз = Cf 5 ка = с — 6с; ks = 09 (c — 120°); кє = с — 300° + 1200° ; 
кт =g (c — 600° + 36063); кв = с — 126c? + 1680c? — 5040c*. 
(Cf. Frisch (1926) and Haldane (1939), who give formulae up to xi»). 
5.2. Show that for the incomplete moments about the mean of the binomial 
Bes > XO noy( ret 
equation (5.12) holds, i.e. 
М, К du 
b `% Ш = (пт ar a. 
| (Romanovsky, 1925.) 
| 
5.3. Writing T; = |) pigr-i, show that the incomplete moments of the binomial ` 
are given by 
j=p 
" pı = pa, 1 
js = pg’, (p — (в + 1)р) + "paio : 
u, = pT L(p — (n + 1)р}* + pq (n — 1)]+ npg (9 — Р) Ho 
and generally 
5 re CRI +2 М 
à y u, = pg (p — npy ^ + npa A ( 2 )s m p ( j Э. 
D =й 7=0 
(Frisch, 1926. This is the generalisation of equation (5.11) to incomplete moments.) 


5.4, Show that about the origin of the hypergeometric distribution 
_ nr Np)" 
bri = yr * 


5.5. From equation (5.48) derive the recurrence formula for the moments of the binomial 


{(1 + Ey — Erynpquo — pii) = trii 
and that for the Poisson distribution 


(1 + Ey — Erano = tri 
(K. Pearson, 1924.) 


136 STANDARD DISTRIBUTIONS 


P 


5.6. Show that if y — € 38 


сул 
d rg d 
[6 УРО ое 


Hence, if a normal distribution is grouped in intervals with total frequency N,, and N, is 
the sum of the squares of frequencies, an estimate of o is 


Ni 

2N./z 
For the height data of Table 1.7 show that this 
error of about 1 per cent. 


— 0-282 09521 
auc QUU 


gives an estimate of с equal to 2:553, an 


(Yule (1938), Biometrika, 30, 1) 


5.7. If a distribution of type (5.24) is represented approximately by a binomial 
(Q + Py, show that 


>Р =np 
УРО = pj —n var p 
var p 


во that P = р +- — ^ and hence is positive 


; consequently that » is positive, 
If, however, the distribution is of type (5.28), then 


im (n — 1) var p 
=p EE ris 
A D 
so that P, and hence », may be negative. 


(pe Student," 1919.) 


5.8. The bivariate Poisson Series. 
are small but na(= 4,), nb(= a, — Àj) 
to the form whose general term is 


Show that when a, b and c i io 
Uds In equation (5.62) 


з) are finite, the distribution tends 


АА: DAT 2) (Aa == Дз) 
кте ит ан, ШШ 


—À-—2,4-2, 
dta fa em 


= eel 


аьаа РТИ 


CHAPTER 6 


STANDARD DISTRIBUTIONS—(2) 


6.1. In this chapter we continue the account, begun in the last, of the standard 
distributions of statistical theory. From the variety of forms assumed by the frequency- 
distributions of experience, as exemplified in Chapter 1, it is evident that an elastic system 
would be required to describe them all in mathematical terms. Three approaches will 
be considered herein: the first, due to Karl Pearson, seeks to ascertain a family of curves 
which will satisfactorily represent practical distributions ; the second, due to Bruns, Gram 
and Charlier, seeks to represent a given frequency function as a series of derivatives of 
the normal frequency function ; the third, due to Edgeworth, seeks for a transformation 
of the variate which will throw the distribution at least approximately into the normal form. 


Pearson Distributions 
6.2. It ‘was noted in 5.23 that in the limiting case the hypergeometric series can 
be expressed in the form 


df _ (к — a)f 
de bo tbis +b? 7 . . . . (6.1) 


This equation may be considered from a slightly different standpoint. The unimodal 
distributions of Chapter 1 suggest that it might be worth while examining the class of 


frequency functions which (a) have a single mode, so that E vanishes at some point v = 4 ; 


(b) have smooth contact with the x-axis at the extremities, so that = vanishes when f = 0. 
Evidently these conditions are in general obeyed by any distribution of the family (6.1). 
In actual fact, as will be seen below, there are also solutions of (6.1) in particular cases 
which are J- or U-shaped. 

The family of frequency functions defined by (6.1) are known as Pearson distributions. 
Before obtaining explicit solutions of the equation, we consider certain general results 
which are true of all members of the system. We have immediately 


(bo + bix + bax?) df = (x — a)f dx 


1 
or a^ (b, + bix + bax?) “ав a(x — a)f da. 


Integrating the left-hand side by parts over the range of the distribution, we find, assuming 
that the integrals exist, 


LO + bye + sy E f {nbat + (n + 1)bia™ + (n + 2)5,27*13f dx: 


Ji gif da — af anf da. . (6.2) 


=% 


Let us assume that the expression in square brackets vanishes at the extremities of the 
Тат 


138 | STANDARD DISTRIBUTIONS 


distribution, ie. that lim z^*?f—- 0 if the range is infinite. We then have, sub- 
stituting moments for ар in (6.2) :— 
= Moin — (n + Vb, — (в 4-2). = Hai — аш, 
or тош + {(n + 1) — a) us, + {(® + 2)ba + 1-0. 2 (6.3) 
This equation permits of the determination of any moment from those of lower orders. 
Tn fact, all moments can be expressed in terms of a, bo, b, and b, and the moments j,(— 1) 


and шу. Conversely we can express these four constants in terms of the moments jj to jr, 
or the three moments about the mean џи, to и. Putting n = 0), 1, 2, 3, successiy 


ely in 
(6.3), we find equations for а, by, bı, ba which result in 
SUE Halha + Зиз) — Ум Bilba + 3) 
Ew СУ TUTTO EU 
pum из(&изша — 343) E H(4B. — 381) 
Ari A ZEE 
by = — Hales + 818) = vss esie) | aae oes (d) 
Ls A A 
b (usua — 3u$ — 615) = (28, — 38, — 6) 
2 A A’ 
where A = luau: — 1843 — 1202 
A’ = 108, — 18 — 1328, > . e (6.5) 


It follows that a curve of the family (6.1) is completely determined by its first four 
moments, p, to p/, The origin, of course, is at the mean. 


6.3. In equation (6.1) the mode is evidently at the point a = a. We have then 
for the Pearson measure of skewness (3.31) 
Gia a zy МВ\(В» + 3) 
VER 108 12р <ай8 о. . - (6.6) 


the form given in 3.31. 
Further, if we take an origin at the mode so that a — 0 we find 
df d af E. f 
022 dub, + be + ba? — (by bue boss (01 baja? + by} 
Thus any points of inflection in the frequency curve are given by 
b 
FEES 0 
о o 
Hence there cannot be more than two of them, and if they exist, they 
the mode. It is not to be inferred that a curve of the famil 


of inflection, for one point corresponding to the solution of ( 
missible range of a. 


(6.7) 


а (Gi) 


are equidistant from 
y cannot have a Single Point 
6.8) may be Outside the per- 


6.4. By a simple transformation of the origin to the mode, (6.1 


2 (log f) = a ) may be written 
Ta B B, + Bis — a) + Bye —ap 
d x . 
log f) = ns (69 
or ax (085 = Б- + BX + BX? i 


у“ 


A 


THE PEARSONIAN SYSTEM 139 


The explicit expression of the frequency function f is thus a matter of integrating the right- 
hand side of (6.9). 

Following Pearson, we may distinguish three main types according as the denominator 
on the right in (6.9) has real roots of opposite sign, real roots of the same sign, or imaginary 
roots. Pearson also distinguished ten other types, some entirely trivial, when the B's 
take particular values.* 


Type I 
6.5. Let 
| Bie В,Х E В.Х? = BAX + a )(X = da), о 432 0 
d x 
Then ax (log f) BAX + o)(X — о) 
Е. 2023 р =: * 
Byer + da) (x + 94) Bay am в) (X C 92) 
ving fakX+ a) Bates Fo {> oy jen г = ; . (6.10) 


This is generally written in the form 


f= if E B ( = =)" BE Je олп) 


m, _ ma 
Uu sv 
The range of the curve is from — a, to a, and by integrating between these values we find 


t= |" ¢ че ANC = z) ds, 
29 а аг 


which, on putting х = (a. + а„)у — a, reduces to 


E. ioa H yy. ® + ap) Ptt 
0 


арта" 
k(aı + аз)т+т+1 


аута" 


where 


dy 


Bim, + 1, m; + 1). 


This determines k and we have 


ата" am ж ү" 
jos (a; F аз)" B(m, + 1, та + jt x z) ¢ z) 1 о 
The origin here is the mode. Taking an origin at the start of the curve we have 


f д атта хл [1 € — 4 Ma 
(a, + а„)"++"++1 (т, + 1, ms + 1) а, 

or again, measuring in units (dv, + &:) times the original, 

1 


re ae 
= Bon, +1, m, +1) 


ЛӨК ИК КО с 


6.6. In these expressions the a’s are necessarily positive, but the m’s may have any 
value greater than — 1. They cannot be less because the distribution function of (6.12) 
or (6.13) would not then converge. 

* Tho numbering of the types followed. herein is that of Elderton (1938). Some variations occur in 


earlier literature and the reader must not be surprised to find the normal eurve referred to occasionally 
as Type VII. я 


140 STANDARD DISTRIBUTIONS 


If m;, m, > 0 the distribution is evidently unimodal and zero at its extremities, If 
one of the m’s is between 0 and 1 the corresponding terminal frequency is still zero, b 


ut 
the frequency curve makes a sharp angle with the x-axis, for 


d is not zero at the terminal. 
If one and only one m is less than zero the curve has an infinite 
J-shaped. If both m’s are less than zero the curve is U-shaped. 

The condition that B, + B,X + B.X? shall have real roots of o 


ordinate and is thus 


Pposite sign is that 
B, and B, are of opposite sign, which is equivalent to 
B 
TER 0 
4B,B, < 
or, in terms of f, and £,, from (6.4), 
Bi. + 3)? 
423. — ЗВ, — 6)(48, — 38) <% ELLO s (514) 


The quantity on the left was denoted 


by Pearson by the letter к 
which will occur again below. The i 


terion 
though (6.4) is true of 


moments about the origin, because the quantity eae isi 
The frequency function of the Type I curve is 

The distribution function, as may be seen from (6.13), 

B-functions, 

Type VI 


6.7. Tf the roots of B, + B,X + B,X? are real and of t 
in the manner of the preceding sections, 


he same Sign it is easy to see, 
the form 


that the frequency functions may be written in 
аїл—@—1 
UE m 
Ba — q = l, qa +1) 
where the range lies from a to co if a is positive and from 


= x %(~% — дуз where q, > 9: — 1 (6.15) 
E — © to a if a is negative, By 
the simple transformation es this reduces to the Type I form (6.13), 
It will readily be verified that if qa >0 the curves are unimodal with 
at the terminals. If 92 < 0 the start is J-shaped and the di 

at infinity. "The distribution function m 
and in this case the quantity « of (6. 


Type IV 
6:8. If the roots of В, + BX + BX? 


) Zero frequeng} 
Stribution fal] DE] 


S awa 
ay be expressed in terms of incomplete Pis aes 
14) is greater than unity. ns: 


are imaginary we have 


d X 
ax 097) Basar Bo a 
Taped (pe a Seek йс сын Жыш 
2B, B. ав 
= x sa 
BAX +y g y 88У 
А 1 
giving log f = log k + ap, 08 (X + у) + à) — i tan-i zr 


1 
= (Ж + y)? + 2} expl__Y tanid ty 
SHEUX + у) + б} exp { Bp tn x ас. Ч 616) 


1 


= 


SPECIAL TYPES OF THE PEARSON DISTRIBUTION 141 


This is Pearson’s Type IV and is usually written in the form 
(1 +5)" pera ce ou pum 
a 


The distribution has unlimited range in both directions, tends to zero at infinity and is 
unimodal. The calculation of its ordinates may be assisted by some tables by Comrie 
(1939). The distribution function has to be found either by quadratures from the frequency 
function or by the use of some tabulated integrals given in Tables for Statisticians and 
Biometricians, Part I. For instance, for the constant k in (6.17) we have 


E ,2N—m -17 
m f ( Ju =) gun айх 
k ts a 


= «| cosfn-? 0 e-? 10 


= aF(2m. — 2, v) in Pearson’s notation. 
In this case the quantity x of (6.14) lies between 0 and 1. 
The above are the three main types in the Pearsonian system. The remaining types 


are described briefly below. A number of results which the reader can easily verify for 
himself are given without proof. 


The Normal Distribution 
6.9. If, in equation (6.9), а = B, = В, = 0, we have 


d a 
ay LEP) S 


If this frequency function is to have a convergent distribution function, By must be negative, 
= — 0° say, and we get the familiar form 
ПКЕЕ 
f= ym EA — о «ac « о. 


ev Gn) 


Thus the normal distribution itself is one of Pearson's types. 


T'ype II 


6.10. If in equation (6-9) B, = 0 and B., B, are of opposite signs, the distribution 
а particular case of Type I, becomes of the character з 


IL 1 x? m 
= —a <rt < 3 
f aB(3, m + x =) a «9 «0 2 o « (6.18) 


(a here being different from the a of (6.9) ). 
ab Та this case the criterion к of equation (6.14) is zero. The distribution is symmetrical 
out the origin and ranges from —a to +a. Ifm>0 it is unimodal with contact at 


142 A STANDARD DISTRIBUTIONS 


the terminals of the range; ifm < 0 it is U-shaped. Іт = 0 the distribution becomes 


1 
lS —a«rz«a. 


the so-called “ rectangular " distribution. 


^ o AND 


Type VII 
6.11. If in (6.9) B, = 0 and B., В, are of the same sign, we find 
f= 1 1 LE LI 
= Вт) P — 0 <rt «o . + (6.20) . 


The range is now unlimited in both directions. Here also the criterio; 
but the difference between this case and that of 
whereas in the Type II case f, < 3. 


n к of (6.14) vanishes, 

Type II lies in the fact that here f: 3, 
Type III 

6.12. If in (6.9) B, — 0 we obtain the distribution 

E peti ж\р nr 

f= Tarp dt y P fess 


this being the form with the origin at the mode. The curve is unlimited in one direction 


It is unimodal if DEONET: 
alent to 20, — 3g, 


(positive or negative as 2 is positive or negative). 
р <0. The condition B, = 0, from (6.4), is equiy 
(6.14) is infinite, 


-shaped if 


TOS One eR 


Type V 
6.13. If the roots of В, + BIX + BX? are equal, ie. k = 1, we arr : С 
tribution » we arrive at the dis- 
yn-i Maly 
fms 5 0<=< о Я - (6.22) 
which ranges from 0 to co and is unimodal. 
Types VIII, IX, X, XI and XII 
6.14. The remaining types are of a more Special character still, 
If in (6.9) B, — 0, В, > 0 we have 
T VII: l—m qim 
ype SU ore Ы ; 0 «m «1, —4 €&s «0 ‚ (6.23) 
If By = 0, B; <0 we have 
Л 1 a m 
Typ IX: р" ( +2)", =a €x «0 
a а (6.24) 
If B, = В, = 0 we have 
Dye Xf де 0 «zr < o 8 P 
с . б 5 К (6.25) 
If B, = В, = 0 we have 
Туре XI: f = "т — ire b «x = о 


ЭЖ. >. (бо 


SPECIAL TYPES OF THE PEARSON DISTRIBUTION 143 


Finally, as a particular case of Type I when 5; — 6f, — 9 = 0, equations (6.4) become 
indeterminate. In this case we have 


NN 25 2 т 
1 = 
Li sy" 1 иг 


J Type XII: f ( 


. 1 
а.) (а + а,)В(1 + т, 1 — т) 1% тп; 
а» 
—a,«z «a, i . (6.27) 


6.15. Pearson curves of Types I and III, and to a somewhat smaller extent, those 
of Types V and VII, arise in the theory of sampling and would in any case have to be 
studied in that theory. Apart from this, the principal use that has been made of the 
distributions in the theory of statistics is in fitting them to observed distributions such as 
those of Chapter l. It has been found that in many cases the Pearson distributions provid 
a remarkably good fit to observation. е 2 

A systematic account of the technique of fitting will be found in Elderton’s Frequency 
Curves and Correlation (1938). We will here merely indicate the general principles and 
give one example of fitting in what is, perhaps, the most difficult case. 


| 6.16. All the Pearson distributions are determined by the first four moments, шу 
И to u, inclusive, except some of the degenerate types which are determined by fewer than 
| four moments. Pearson’s method of fitting consists of 
(1) determining the numerical values of the first four moments of the observed dis- 
tribution ; 
(2) calculating the numerical values of fi, f;, к (equation 6.14) and hence determining 
the type to which the distribution belongs ; 
(3) equating the observed moments to the moments of the appropriate distribution 
expressed in terms of its parameters; and 
(4) solving the resulting equations for those parameters, whereupon the distribution 
is determined. 


The following example will illustrate the process :— 
Example 6.1 


In Table 1.15 there are shown, in the column totals, a distribution of 9440 beans accord- 
ing to length. The figures are repeated in Table 6.1 on page 150. Required to fit a 
Pearson distribution to these data. 
For the moments it is found that, with Sheppard's corrections, 


45 (centre at 14-5) = — 0-190,783,898 
из = 3-238,424,951 
ГА = — 5-306,566,352 
Is = 50-999,624,044 
В. = 0-829,135,838, ^ 4/f, = — 0-910,569 


Ba = 4-862,944,362 
First of all, as to type. For the criterion « (6.14) we have 
- Gy з 
4(48. — 381)(2B2 — 38. — 6) 
_ 51-262 


e — 84040 
y 


144 STANDARD DISTRIBUTIONS 


This lies between 0 and 1 and hence the appropriate curve is Type IV. We have to deter- 
mine a, m, у in 


N CENSET ez 
14 . 
f af (2m — 2, X 25 z) $ Р 


Writing {ап 0 == and 2m — 2 =r we find 


in 
ш = f aq"! cos™= 0 sinn 0 e~? 40, 
“и 


whence, integrating by parts with cos'-" 6 sin 0 as one part, 


Ps = peu н ~ vi, 1), 


a particular case of (6.3). Hence, in terms of moments about the mean, 


, av 
А aes T: 
а? ^ 
Шш» = Peo)” 3n) 
F 4a?y(r? + у?) 
^7 арр 3 
— 3a*(r* + »?)(r E elre + y) — 82) 
d r(r—l1(r—zyr-8j > 
whence it is found that 
т 99: — В. — 1) 
2, = 38, т: б 
i r(r — 2) Vp, 


У 1) Bir =H} 


= NET s qs Ate 2]. 


a 


Substituting for f, B, and 


Ha we find 
T — 14-697,72, т = 8:348,86 
» = 18:380,43 а = 4-159,49 


The signs here want а little watching. 
positive and » positive since УВ, 
From the tables of F 


r and m present no diffic 
is to be considered negativ: 


Чу; but а is to be taken 
(r, >) we evaluate the constant term 


e. 


k and finally arrive at 
а2 —8:348,86 e 
—0395121(1.p — — -18880,43tan l ж _ 
j 2 ( Е н) e г15010. 


The frequencies given by this curve are shown in Table 6.1 On page 150 
o . 


6.17. The following points are worth noting in connection with the fitti 
curves to observational data :— E m 


(1) Although the various typ 


TCHEBYCHEFF-HERMITE POLYNOMIALS 145 


between Types IV and VI and is very similar to the shape assumed by those curves 
near к = 1. 

= (2) Itis tacitly assumed that the data can be represented by a curve with finite moments 
up to the fourth order at least. Curves for which higher moments do not exist were 
called by Pearson heterotypic ; but there is nothing sinister about them except that they 
do not fall within the Pearsonian system. 

(3) In ealeulating moments, Sheppard’s corrections are usually to be employed when 
there are contacts of sufficiently high order at the terminals. In the case of J- or U-distri- 
butions the other corrections mentioned in 3.27 may be employed. This case sometimes 
raises difficulties in that the resultant curve does not start in the right place. In such 
circumstances there is no goldenrule. The most satisfactory course is to try several curves 
(or the same curve translated to several points) and to judge by the results which of 
them gives the best fit. 

(4) The quadrature of Pearson curves, as indicated in the foregoing, may in some 
cases be effected by tabulated integrals; but the more generally applicable procedure 
appears to be to calculate ordinates direct from the equation of the curve and then to 
find areas in ranges by Simpson’s rule, Weddle’s rule, or some similar process of quadrature. 


6.18. The mathematical description of an observed distribution by a Pearson curve 
may be regarded from two rather different standpoints. If our object (for instance in 
actuarial work) is to obtain a mathematical expression which will satisfactorily represent 
observation and allow of accurate graduation and interpolation, fitting by moments is 
generally satisfactory. The method has, however, been criticised when the observed data 
are regarded as samples from a population, and it is desired to find a mathematical repre- 
sentation of that population. In such cases the moments calculated from observation are 
only estimates of population-moments. It has been objected that they may be inefficient 
estimates, and alternative methods have been proposed. We shall have to defer a full 
discussion of this point until the second volume. 


6.19. Other systems of curves have been studied, mainly by Scandinavian writers, 
with a view to representing frequency functions by expansions in series. It is well known 
in mathematical and physical work that functions can often be usefully expressed as a 
series of terms such as powers of the variable (Taylor’s series) or trigonometrical functions 
(Fourier’s series). Neither of these forms is very suitable for frequency functions, but 
we proceed to consider another set of functions with more promising possibilities. 


Tchebycheff-Hermite Polynomials 
6.20. Writing 


@ = 679 
alx) =———e 2 
v (2) 
and = a 
i da: 
consider successive derivatives of a(x) with respect to x. We have 
Da(x) = — vx(a) 
D?a(x) = (x? — 1)a(a) 
D?a(v) = (3x — x?)x(x), 
A.S.—VOL. i 


146 STANDARD DISTRIBUTIONS 


and so on. The result will obviously be, in general, a polynomial in x multiplied by a(x). 
We then define the Tchebychefi-Hermite polynomial H,(z) by the equation 


(= Dya(z) = H,(x)&(v) . Е . 5 . (6.28) 


Evidently Б, (х) is of degree r in z and the coefficient of дт 


is unity. By convention H, = 1. 
We have 


1 a? 12 e 
=2 = = t ie =) = (т) ех кал 
a(x ) Yen exp ( al te 5) a(x) exp (е 5) 
and also, by Taylor’s theorem 


ae —) = У орыр) = Ў 3 Ejea). 


j=0 7=0 


Consequently Z,(x) is the coefficient of £ in exp С — 3) It follows that 


7 ril т\б) 
I н. ;-2 EL = 
Head bord gr + . . (6.29) 


The first ten polynomials are 


Н = 
H, == 
H, =z? —1 
H, =x? — 3x 


Н» = 25 — 10r? + 15g 

Н, = 48 — 1524 i 45x? — 15 E) 030 
Н. = 27 — 9135 + 10543 — 1052 

H, =z? — 9839 + 91074 — 42022 + 105 

Н, —a? — 36x7 + 87825 — 196053 + 94525 


Hy) = 220 — 4538 + 630x* — 3150x4 + 472522 — 945 


6.21. The polynomials have a numb f interesti i g 
Bre polynoi vi umber of interesting properties, Differentiating the 
З TiN MPH) 
ехр G >) — a 
ј=0 
with respect to z and identifying coefficients in ” we have 
d 
Geir) =н, (a) ; А С 4 (6.31 
and generally ў DH, (x) = VM зү cos 0 
Differentiating the identity with respect to ¢ and identifying coefficients boe (032 
we 
H(t) — =Н, (c) + (r — 1)H, (x) = 0, . ate 
From (6.31) and (6.33) together we find ‘ 99) 
dHe) _ анда) 
oda т HI) = 0, 


- e (634) 


~ “in absolute value than 


THE GRAM-CHARLIER SERIES OF TYPE A 147 


It is also known that the equation in v, H,(x) = 0, has r real roots, each nov greater 
rir —1). (су. Charlier, 1931.) 


Tables of the values of the first six polynomials to 10 decimal places proceeding by 
x == 0 (0-01) 4 have been given by Jorgensen (1916). 


6.22. The polynomials have an important orthogonal property, namely, that 
K H,,(%)H,(x)a(x) dz = 0 m zin 
i =n mon eee Oe ОЗО) 
In fact, integrating by parts, we have, if m <n, 
É H, Hye de = (— |] HD dz 
а= [ар ба | Sp dz. 


The term in square brackets vanishes and, in virtue of (6.31), the integral becomes 


m(— pes Н. 12048. 


Continuing the process, we find either zero, if m is not equal to n, or m! ifm — n. 


The Gram-Charlier Series of Type A 
6.23. Suppose now that a frequency function can be expanded formally in series 
of derivatives of a(x). (We shall discuss the conditions under which such an expansion 


is valid below.) We have then 


оо 


fle = > Hj wala). 
7=0 
Multiplying by H,(x) and integrating from — оо to co we have, in virtue of the orthogonal 
relationship (6.35), 


1r? 
oe al. TOLE da y ЛОР ева) 
The reader familiar with harmonic analysis will recognise the resemblance between this 


procedure and the evaluation of constants in a Fourier series. 
Substituting in (6.36) the explicit value of H,(x) given in (6.29) we find 


Tips. Бы та, 
„= 3 7-2 H gaji n + 5 3 . (0.37) 


In particular, for moments about the mean, 


бека! 
CeO 
кее 
1и 
(uta — би» + 3) . З . (6.38) 
alfts — 10и») | 
(в — 15u4 + 45и» — 15) | 
ig — 2lus + 105445) | 
lgg(us — 280 + 210и, — 420и» + 105); 


148 STANDARD DISTR IBUTIONS 


Thus we find the formal expansion 


F(z) = a(x) (1 + 4( (uz — Н, +1,H, + (4 
І f(z) is in standard measure the series becomes 


Ја) = «(z) (1 + 34H, esu. 9)нН +t . bites 
This is the so-called Gram-Charlier series of Type A. 
Edgeworth’s Form of the Type A Series 
6.24. Consider the Fourier ar e of a term Н (a) (ac). 


Since A/ (22) a(t) = ет 5. [ie а co 


we have M (22) js 


gitz 28 
dtr jm = pitur v (22)H,(t)«(1) = |: КЕ Эл) € ? dx 


and thus the characteristic function of 270(2:) is ir / (22) H,(t)«(t). 


Con: 1 - 
version Theorem of 4.3, we have versely, by the In 


waa) = all e" iry/ (27) H,(t)a(t) dt. 


Zu). 


Interchanging x and t, we find 


У@л)(— irat) = | etH Layee) dx 
and hence, changing the sign of t, that the transform of H 


r(@)a(x) із (2yr 
Consider now the expression 18 vGn)rra(). 
exp («,D")a(a) Я А É ; " 
lts characteristic function is . + (6.41) 
M , 7 Dri 
| e exp (x,D')a()da: =f e (Sr =” Fate) de 
= zf ett Dria(x) dx 
oll tas i К 
al eitz( 1H, (а) а) e 
- EX (22)(— Фу) 
= +/(2z) a(t) exp (— kit К 
In a similar way it will be seen that the characteristic function of 059) 
аа 0T о 
exp TP g D gi? TD gA Jaca - (6.43) 
is equal to 
УФ oxp [Sta ® ane ша m Ее 
More generally, if 


as 1 E roma 
В 7 yay 


— ôm --3)H,. + .. THES (6.39) 


EDGEWORTH’S FORM OF THE TYPE A SERIES 149 


the characteristic function of 
exp { pe AUT) раа Apa Ds | р. e Jw . (645) 


1! 2! 
is equal to 


/ (2x)a(ta)e exp [s p pcm by + RN | AON Ж. j . (6.46) 


as may be seen by the same line of argument. 

Now suppose that (6.45) represents a frequency function. Its cumulative function 
is then the logarithm of (6.46), ie. is equal to 

ba ECC pars D © (ду p e Des 

and hence its cumulants are к, — а + m, ka — b + 6%, Ka, Као Ко etc. We may 
take а = m and b = о? and thus we obtain a distribution whose cumulants are ki, Ks, . . . 
etc. Now if these are in fact the cumulants of a distribution the series (6.45) must be 
equal to that distribution, provided that (1) the series converges to a frequency function, 
and (2) it is uniquely determined by its moments. 

If we take the frequency function to be expressed in standard measure, then x, — 0,к„= 1 
and (6.45) becomes 


3 4 

exp т T + ӨР эе oe) = f(a) s а 3 . (6.47) 

where we have written (x) for f(x) because now m vanishes and о? = 1. 
A series of this kind was derived by Edgeworth (1904), though from an entirely different 
approach through the theory of elementary errors. Equation (6.47) is formally identical 
with (6.40), and the reader who consults the original memoirs on this subject may be 
puzzled by the fact that Edgeworth claimed his series to be different from the Type A 
series and better as a representation of frequency functions. The explanatian is that 
for practical purposes it is necessary to take only a finite number of terms in the series and 
to neglect the remainder. If we take the first Æ terms in (6.40) the result is in general 
different from that obtained by taking the first (k — 1) terms of the operator in the expon- 


‘ential of (6.47). The argument centred on the fact (cf. Example 6.3 below) that the 


terms in (6.40) do not tend regularly tó zero from the point of view of elementary errors, so 
that in general no term is negligible compared with a preceding term. 


6.25. In standard measure the relations (6.38) become, in terms of cumulants, 


Ср ly €; =c,= 0 \ 


Ka 
ei 
€ 3 iD fe . > `o (6,48) 
€, = aot + 10«2) 
е: = sant + Збкакз) 
б = is^ бб 4 25: 


150 © STANDARD DISTRIBUTIONS 


6.26. In the practical representation of frequenc: 
only the first few terms can be taken into account. 
dependent on u, and for r> 4 this is unreliable o 
- sampling effects are not in question the series may be taken to more terms, usually not 
higher than the term in Hs. We should then have to investigate how far the observed 
distribution can be represented by the series 


10к? 
«e EEH, tpit Оаа ) ' 


у functions by the Type A series 
The term in H,(x) has a coefficient, 


0 720 

in the hope that the remainder after these terms co 

It may be noted in passing that the distributi 
obtain. If 


- — . (6.49) 


uld be neglected in comparison. 
on function of such a Series is easy to 


Hla) = Za, H cya) 
then | fer = zaf” mes) ax 


= — 24,H,_ (x) a(x). D ^ . + (6.50) 


TABLE 6.1 


Fitting of Pearson Type IV Distribution and Gram-Charlier Type A Series to the Data of 
Length of Beans (Table 1.15). 


(From Pretorius, 1930.) 


Length of Observed Type A. T A. 
Type IV. ype A, Type A, 
eum Frequency. УР (1) (2) kon A 
| = ES 
| S| 
— — == 16:3 ат 
F. E | { { 15-2 { 20 ^ 
17: 6 14 12:8 13. 
105 55 { 28-5 { 25-6 | 164 — 353 
16-0 215 299.3 241-7 370.4 22.3 
15-5 1129 1181-6 1012-7 926-2 438.1 
15-0 2082 2132-6 2155-4 1833.9 1214.9 
14:5 2294 2229-8 2593-0 2506-4 1866-9 
14-0 1787 1638-9 1788-4 2082.6 2112.8 
13:5 929 968-9 713-4 921.3 1916.7 
13:0 437 503-6 280-7 199-9 1183.4 
12.5 199 243-7 258-7 132.1 371.9 
12:0 115 113:8 206-2 178-1 66-9 
11:5 т0 52-5 98-7 117-0 101.2 
11-0 36 24-2 29-6 43.5 107-1 
10-5 18 11:3 5-9 155 54. 
10-0 7 54 154 
9-5 1 9.6 { 9 ee 
v aw ү {з 
| rl 
Torars 9440 9440 9440 9440 [35g E 
eae 9440 
(The brackets mean that the frequencies shown are rounded y ; 
in blank rows covered by the brackets.) P and include Some sm, 


all frequency 


wing to sampling fluctuations. When ` 


j 
ү. 
| 
| 


TETRACHORIC FUNCTIONS 151 


Example 6.2 


Consider the fitting of a Type A series to the bean data of Example 6.1. 
We have already found the first four moments. In standard measure we have 


иа = — 0:910,569 
a= 4:862,944 
and we also find Hs = — 12-574,125 
в = 53-221,083. 


Hence the series is 
9440z(v) (1 — 0-151,762 H, + 0-077,622,7 H, — 0-028,903 ,6 Н, + 0:014,273,5 He}. 


Table 6.1, on page 150, due to Pretorius (1930), shows the frequencies given by taking 
the first three, the first four and the first five terms of this series (columns headed Type A(1), 
Type A(2) and Type A(3) respectively). A glance at the figures will show that the four- 
and five-term series is no better than the three-term and, if anything, rather worse. Further-- 
more, the five-term series gives negative frequencies at one terminal and a mode at 12 mm., 
which is contrary to the data. The representation is clearly not very satisfactory and no 
better than that given by the Pearson Type IV curve. 


| 


н 
1 
| 
[ 
3 
b 


T'etrachoric Functions 
6.27. The terms H,(x)a(x) may be obtained from Jorgensen’s tables combined with 
those of the exponential e, Some related functions have also been tabulated in Tables 
for Statisticians and Biometricians, Parts I and II. The function 
(—1y-1Dr-le(x) — H,—1(x)a(x) 
oF C 
is known as the Tetrachoric Function of order r, and tables are available to seven places 


of decimals for r = 0 (1) 30 and x = 0 (0-1) 4. In the notation of these functions, series 
(6.49) would become 


fto) = nle) + У2А 
and the particular series of Example 6.2 would e 


fle) = 9440 (r.(v) — 0:743,477 т.к) + 0:850,313 t(x) — 0-775,565 т(ж) + 1:013,318 c; (a) }. 
The reason for the definition and the name of the function will appear in Chapter 14. 


Vii Wists ЙА (6751) 


т, = 


€ == 


кзта(2) + кать(&) +... 


6.28. Up to this point it has been assumed that a frequency function possesses a con- 
vergent Type A series. We shall not here enter into a discussion of the conditions under 
which this is so, except to warn the reader that a great many mistakes have been made on 

. the subject and to quote some theorems without proof. 
(1) Cramér (1926). If f(x) is a function which has a continuous derivative such that 


MORS 
= 


Ка and if f(x) tends to zero as | æ | tends to infinity, then f(x) may be developed in 
© series 


дә = Ураа , TERI. ЧО о 


j-0 


152 STANDARD DISTRIBUTIONS 
where су is given by 3 
с = | /(®)Н да) da. 


This series is absolutely and uniformly convergent for — со 
(2) A theorem by Cramér (1926) based on one by Galbrun. 
tion in every finite interval and if 


Л А 


‘exists, then the expansion of f(x) in the series (6.52) converges everywhere to the sum 
30706 + 0) + fiw — 0)}. The convergence is uniform in every finite interval of continuity, 

Cramér has also shown that this last theorem cannot be substantially improved upon 
as regards the behaviour of f(x) at infinity. Consider in fact the function Ја) = есе, 
We have, in virtue of (6.33) and (6.31), 


ir e" (x) du = |. 


EE] 


«t «oo. 
If f(x) is of bounded varia- 


СЕМАН ПЕ (p= Df ea H йс 


— Аг? © =A oo Р оо 
= Tb] 1 | € 7H, e д: —(r—1) 


| €" H, у, 
= (7 — Ze — Jj €H, (x) dx. 


If r is odd the integral vanishes because 


H, is an odd function of a, 
the integral becomes 


If r is even, say 2r, 


(2r —1)(2r — 3)... (5; =a x 
_ [n (2r) A т 
чү a у. 

1 E 


3-1 
The appropriate coefficient of H; in the Type A series ів then (a ) 


ANA EE "P Now when 
e? ms @ Ый = a E n The series then becomes 
| зз ~ ау 
1) 2920) 23 
In virtue of the Stirling approximation to the factorial, the pth ter à 
in the limit i m of this, Say t, becomes 
uÙ, ~ ( — 2) zm 
22) V/v) 
so that us so 21 


24 
Hence, for 4 < 1 the series is divergent, 
6.29. From the Statistical viewpoint, however, the im ion i 
d 9 r ? 4 portant questio 
an infinite series can represent a frequency function, but whether a finite nbe, aud 
rms 


TETRACHORIC FUNCTIONS 153 


can do so to a satisfactory approximation. It is possible that even when the infinite series 
diverges its first few terms will give an approximation of an asymptotic character, 

This subject has not yet been fully explored and there has been some controversy 
about the value of the finite Type A series. Two things seem clear :— 

(а) The sum of a finite number of terms of the series may give negative frequencies, 
particularly near the tails (as, for instance, in Example 6.2). 

(b) The series in the Charlier form (6.40) may behave irregularly in the sense that 
the sum of & terms may give a worse fit than the sum of (& — 1) terms. 

How serious these disadvantages may be depends on the purposes in view. So far 
as practical graduation is concerned it would appear that the finite Type A series is successful 
only in cases of moderate skewness and in many such cases a Pearson distribution is just as 
good. In many statistical inquiries we are more interested in the tails of a distribution 
than its behaviour in the neighbourhood of the mode, and it is here that the Type A series 
appears particularly inadequate. 

But this is not by any means a unanimous view. Arne Fisher (1922) has considered 
а modified form of the series which he claims to meet most of these criticisms. He considers 
the series 

f= (6 + cH, +... cH, ala) З 5 : + (6.53) 


but determines the c’s, not from the observed moments and the relations (6.38) but by the 
method of least squares, i.e. so that 
27 — (co +H ...+,H,)a(x)}2 

shall be a minimum. The method involves some laborious arithmetic, but Fisher has 
Successfully graduated a number of actuarial experiences by using it. 

Two other actuarial statisticians have pointed out the difficulties of the Type A series, 
Steffensen (1930) adducing some theoretical objections and Elderton (1938) summing up 
in favour of Pearson distributions, 


Example 6.3 


As an illustration of the irregular behaviour of terms in the Type A series, consider 
the distribution 


1 
К = —vP-18-- qm Ü xr o. 
Гр) С 
Its characteristic function is 
1 

ДЕ ано анан 

0 = тр 
and thus k, = p(r — 1)! А 


or, in standard measure, 
т — 1)! 
oe аваз d 


D 
g-1 
p? 
From the manner of the formation of terms in (6.48) it is evident that the coefficient 
с, is the sum of terms к,, Kj s Ks + + + (Ka, Ky + + + Kgn), Where (qa... Ym) is a partition 


of r such that по 0 is less than 3. It will then be clear that, since ко is of order p, the 
term of greatest order in p is that with the greatest number of parts in (i... ко). 


2 


For example, if r = 9 it is (3°), if r = 8 it is (42), and so on. 


TN STANDARD DISTRIBUTIONS 


From these considerations we can find the order in p of the terms in the Type 


A series, 
They are 

Term . . . + C сз Cy cs % C7 Ca Co Cio 0115 | Cis 
Order in p . А . 0 iut pi ЕО E S -2$ 9 


The terms decrease in order of p, but not at all regularly, 
Coefficient will be negligible compared with a preceding о; 
qualities of such series obviously require careful investi 


and it is clear that in general no 
ne if p is large. The asymptotic 
gation in particular Cases, ' 


The Type B Series 


6.30. Just as the Type A is derived from the normal integral, a Туре B series has: 
been derived by Charlier from the Poisson distribution. Writing 


e "mz 


y(m, х) = zi . . . . + (6.54) 


for integral values of z, put 


—m рл 
y(m, z) — EE em cost соз (m sin Ё — xt) dt 
л Jo 


for all z. When z is integral this reduces to (6.54).* In other Cases 


Qm n Y ‚ mi 
ye) Se prima ш x 
y(m, x) z sin ax D Ул = і 5 - (6.56) 


j=1 
Уу(т, z — 1) = у(т, x) — y(m, x — 1) 


and fee. .. 
7=0 


This is the Type B. Charlier Tecommends it in cases of Skewness when T 
inapplicable (though the dividing-line is not clear). In theory it ma 

variates, but in practice has only been applied to 
equal intervals, In fact, 


Write 


appear (cf. Steffensen, 1930). cus form 


6.31. Defining polynomials G, by the relation 


У(т, x)G(m, x) = £ y(m, ж) ; . " 


we find that 


т, 


оле. 


J! 


Р ; Fu LN 
G, = coefficient of "in (1 + 5) 
т! 


"+ (6.59) 


27 


л 
* The integral | em ei-iztqt by the Substitution ей =z, ; i 
s = 2, із 2л times the Tesidue of Qu 


I M TN 
z izt+1 Ш the 
unit circle and is thus equal to 2л T R 

z! 


THE GRAM-CHARLIER SERIES OF TYPE B 155 

In a similar manner to that used for the Tchebycheff-Hermite polynomials we have 
4g, а АТ > Е 

т 
which may be compared with (6.31). It may also be shown that 
Gy eqryVaume) eee 
у(т, x) 

and thus G, may be calculated from the rth differences of the Poisson function y(m, x) in 


the same way that H, may be derived from the rth differential coefficient of the normal 
distribution. 


The G's also obey the orthogonal law 


И Р cu (CSD) 


oe 
n. 
Thus if 
f= > be) 
7 
b= Ушу. „ы OMNES 
75 E) 
TABLE 60.2 


Type B Series filled to a Discontinuous Distribution of Particles emitted by a Radioactive 
Element in Units of Time. 


Type B eB Type B 

© Frequency. (2 eae, Ge Mrs. (4 CER 
0 57 49-5 49-0 58-2 
1 203 201-3 201-0 199-8 
2 383 403-4 404-3 386-1 
3 525 532.3 533-8 523-9 
4 532 520-6 521-5 532-1 
5 408 402-6 402-5 418.2 

6 273 254.8 254-4 260-2 4 

7 139 137-1 136-7 134-0 
8 45 64-0 63-9 56-7 
9 27 26-1 26-2 22.9 
10 10 94 9.6 8:6 
1 4 3:0 34 3-6 
12 0 0-9 0-9 1-6 
13 1 0-2 0-2 0-8 
14 1 0-0 0-0 0-3 
АА Torars 2608 2605-2 2607-1 2609-0 


156 STANDARD DISTRIBUTIONS 


In the same manner as for Type A we have, choosing the constant m equal to ш, 


„=1 
b,=0 
b. = iu. — m) | 
b, = ais — 3и, + 2m) ү ° ERAS 
1 
b, = gun — биз + us (11 — бт) + 3m(m — 2) 
etc. 


Example 6.4 


Table 6.2 shows the frequency of the number of alpha-particles (x) emitted by a bar 
of polonium in intervals of ith of a minute in some experiments by Rutherford 


The Normalisation of Frequency Functions 
6.32. Several of the important theoretical distributions Occurrin 


normal, but for small or moderate n this may be hardly exact enou 
we are nevertheless able to use the normal integral by seeking for a у; 


È = do + аул + aye? + ax? + , on 


s . - (6.65 
where the a’s are of order n~t or smaller. By choosing the a’s appropriately we са, | А ) 
the distribution of £ much nearer to normality than that of x and hence find the dist ii ES 
function of x from that of £, assumed normal. ribution 

Consider in fact the Edgeworth form of the Type A expansion (6.45) 
ку т Ka — а? 2 _ Kans LSLE 
exp { тта D+ a D gi? Fen Jz zalm)! - (6.66) 


We have retained the terms in D and D2 because the approximation тау perhaps he a; 
improved by taking m and c? in the é-distribution not quite equal to ils е slightly 
variance of x. mean and 


We now assume that the cumulant к, is of order n-7, а case of fairly comm 


rence ; that к, — m is, by choice of m, of order 2-1 ; and that к, — 0°, by ch 9n occur- 
is of order »-?, so that we may write > 2У choice of о?, 
кі =m = |с 1 = O(n-#) 
k — а? = [,g? 1, = 0(n-1) 


Then o? is of order Ka le. 271, and thus 


Ky - 

+= 0(n1-ir), 
Thus (6.66) may be written 
exp (— LoD + 11,0202 — 41,6°9D3 + 35l,01D4 — 11sl,05 D5 


Y 1 
710160808 — . ... 7 gar mp 
F T206 Hajt 2 : р 


* (6.67) 


THE NORMALISATION OF FREQUENCY FUNCTIONS 157 


where l and l, are O(n—*) 
l, and l, are 0(n— 2) 
1; is O(n-*) 
1, is O(n 3), etc. 
Expanding the operator and retaining only terms up to and including 0(»-3) in the 
Us we find for the operator 


1 — loD + {100° — 1,63D* + 1,00% — 1151,05D5 + „11,0808 4 
+180 apa за 3 - 


1 
ROD owe D? — Lho*D? + MoD’ — +, cae 
Lalo D* — т 970" + sbalalo8D*) + 4 ( — Возр" 


as 


+ 2510909 — 


atil D + 20lo4D4 — hoD — loD + aslo nDO — у, 18070" 
+ 098 + Milde! + ӘӘ) + ay (сара + 
+ 301,099 + BBD + LBD) O, POL. с) 


The result of this operation is a similar expression, which we will not bother to write 


mA and multiplied by 


out at length, with the operator o*D" replaced by (— 1)'H, Е 
1 
oy Qs) 

The distribution function is given by integrating this expression, and we then have 


for the frequency less than or equal to m + ox (arranging the terms in order of 
magnitude in a) 


e —әл®- т): n 


|. 1 1 
-o V (27) v (27) 
T Ez E TRH») — QRH. + WH. + LH. + АЛАН, + y 
LBH, + ТЗН, + ASH) (ИН. -+ 48H, 1 
T ОШЫН, т АЙН, Y ЗАН + ТН, + т} 
+ ан, + aH, + rial _ i на, Ẹ vli 
ТЫНЫ] зо в А ГОИ (бз) 


eV ах + 


eg р— (l T 44H.) ES (uH, + 15H, 57 IH, 


6.33. Now let £ be a normal variate. We will determine £ in terms of ж such that 
Р яе; fa ed. . š а Е . (6.70 
дас? = Ро) (6.70) 


F(x) being the distribution function given by the Type A expansion (6.69). 


We have 


9 dy = Р) =f +€ а) 


Les 


е л „м (x — £) m 
Nem 2 dy 1i "s. E j: ГЕ ду +... eto, 


by Taylor's theorem, 


p NEUE (s =) a —6)* 
IT ves *% ips) SH (wale) — ea “Ei (vated) Дае. (6.71) 
and this is equal to (6. 69). 
The next step is to invert this series so as to give (x — £) in powers of x. Assuming 
v — E = a + aix + а? Б... ete., d D « (6.72) 


158 STANDARD DISTRIBUTIONS 


we see that when x = 0, £ = — ay, z — £is of order n-* ; and hence, to order n~? we have 
from (6.71), with x = 0, 


A (с аз 

& seres (а 

and this is equal to the expression in square brackets in (6.69) with x = 0. 
We then find 


а = l zz ils aan EU T 11, AF НАА + ЧАЙ 3P vols ж Fall a Tür 
We can now find a, in (6.72) by identifying coefficients in z, and so on. 
reduction we find, writing the terms in descending order in n, 
z— È= 1 + da? — 1) + Mac — 1 + 1003 — Зх) — quls (4x* — 72) — 311, 

T 00, аы 502 — 8) — Ша — 1) + тїшї — Gx? + 3) + (19да — 

T тіж 1304(1124 — 49ш + 15) + 150692 — 1872? + 52) — abe + НДАЙ 
+l — (Та? — 152) — sohls(z* — 32) +15 (д® — 1022 + 15x) 
— 1052 + plik (36x? — 49x) — ahl (5x5 — 320° + 35x) + gelilsl (1123 — 21a) 
— siglas — 482 + 512) — 11813828 — 1872) +. zi (11125 
— 54129 + 4562) — L5, 1(04825 — 3628? + 24732), 


- (6.73) 
This is our required expression of the variate $ in terms of the variate 424 То 
n-? at least & will be normally distributed. ; 


It is often more convenient to express vin terms of 2. This may be done b 


z —5 —g(z) = g(5 + s — 8) 
=9(5) + = £g) E... 


= 96) + g'()(g(5) += — € (ЖӨ од 


2] 
Tl, + R. 


After some algebraic 


7) 


order 


y noting that 


and by continuing the process 
v —£—90) + 9g) + 992) + age 
+ 495g" (E) 2 .... 3 1 d / с Я ) i - (6.74) 
Hence, using the value of £ given by (6.73) we find, after some reduction, 
a —6—h + 4? — 1) + Ue + zall? — 3£) — 
+ rish(£^ — 662 + 3) — 2.131404 — 52° 
— 485 — hlé — 32) + vish(55 — logs 4 195) + A41i2(10£3 — 255) 


— sh.l(3P — 2489 + оору — Toolals(25> — 1788 4 218) + shel (145 
7 10329 + 1078) — 74, (25255 — 16883 + SOMNI OR aan (6.75) 


(E) + g(5)g'*(t) + 29" )g Eg E) 


эб (228 — 54) — ap aces _ 1) 
+ 2) + 31401255 — 5322 + 17) 


Example 6.5 


Consider again the distribution of Example 6.3— 


We have already found that, in standard measure 


i » this tends to the normal form, and 
that к, is of order p'-z, 
We will take 7, and 1, of (6.67 


) to be zero, which im 
to have the same mean and varia; 


plies that o 
nce as that of a. 


We have 
l = 2р-+ l, = 6р-1 Is = 24р-? 


ur normal variate € is 


le = 120p-2, 


THE NORMALISATION OF FREQUENCY FUNCTIONS 159 


(6.69) ер becomes 


г 1 1 1 1 1 1 1 1 
. ~it a — 32? ah L Н ЕЕ 1 Н, - - H; 
[és e oa [t + gas + gals 5р! t o He 4 Tap 
1 e 1 Wi үп 
\ gps i gays + 15: i zy : 19145: ; 


Let us, as a simple illustration, find the distribution function of x for p = 9, = = 12. 
The mean of the distribution is then 9 and its variance 9, so that this corresponds to 
a deviation (12 — 9)4/9 in standard measure, equal to unity. It is found from (6.30) 
and an additional equation for H,, that 
Н,=0, H, = —2, Н, = —2, Н, —6, Н, = 16, H, = — 20, Н, = — 132, Н, = 98, 

Hy, = 1216, H,, = 936. 
We then find for the distribution function 
xz l 675(0-015,163,5 
E Er cA oo 
The values for the normal function are obtained from the tables and we get the value 
0-841,345 + (0-241,970,7)(0-015,163,5) = 0-8450, 
which is exact to four places. The approximation is evidently fairly good, even for values 
of p as low as 9. 

We could have found the same result by using (6.73). Substituting x = 1 in that 

equation we find 

é = 1:015,386, 
and the distribution function for the normal integral with deviate equal to this value of 
€ is 0-8450 as before. 

Suppose now we wish to find the deviate x whose distribution function is F(x) = 0:99 
when p = 15. 

The normal deviate & corresponding to such a value is found from tables to be 2-326,848. 
We then have from (6-75) 
pE — 8£) +, еёе., 
which will be found to give 

ж = 2-697,22. 
This is the value in standard measure. The deviate in ordinary measure is 
15 + x4/18 = 25-45, 
This is exact to two places of decimals. 

The example shows that, notwithstanding the non-convergence of the infinite Type A 
Series, a satisfactory approximation may be obtained from its first few terms, at least in 
certain cases. We may remark without proof that by an adaptation of a procedure given 
by Cramér (1928) it may be shown that an asymptotic expansion does in fact exist for 
the distribution of this example. 


NOTES AND REFERENCES 


An excellent account of Pearson’s distributions is given in Elderton’s book. Examples 
of the fitting of the distributions to the data of experience abound in Biometrika. 


160 STANDARD DISTRIBUTIONS 


For the Type A series see Charlier (1906 and 1931), Henderson (1922), Cramér (1928) 


and Bowley (1928). For the Type B series see Charlier, Jordan ( 1927), Aroian (1937) 
and Steffensen (1930). 


Charlier has als 


О proposed a Type C series, as to which see his paper of 1928 and the 
brochure of 1931, 


A very good general account of these distributions and an examination of the possi- 


bility of extending them to the bivariate case is given by Pretorius (1930), who gives a 
number of references, Up to the present no entirely satisfactory system of bivariate 
distributions Corresponding either to those of Pearson or to those of Charlier has been 
as f distributi to the normal form 

For some early efforts by Edgeworth to transform distri! utions to al ; 
see Bowley (1998) and Pretorius (1930). The approach of sections 6.32 and 6.33 is due 
to Cornish and Fisher (1937), who give some tables which are useful in this type 

The polynomials Н, (х) are frequently referred to by English writers as Herm 
nomials, but they are really due to Tchebycheff (Mémoires de ? Acad emie de Saint Pét 
1860). "Hermite's papers on this subject followed four years later (Co. 
93 and 266). 


An interesting approach by fitting curves to th 


frequency function has been made by I. W. Burr (194 
Ann. Math. Stats., 13, 215 


Aroian, L. A. (1937), ** The Type B Gram-Charlier serie: 

Bowley, A. L. (1928), F. Y. Edgeworth’s Contributions 
Statistical Society. . 

Charlier, C. V. L. (1906), Researches into 


ite poly- 
ersbowrg, 
mptes rendus, 58, 


he distribution function instead of the 
2), “ Cumulative frequenoy functions,” 


$93 


5,” Ann. Math. Statist., 8, 183. 


to Mathematical Statistics, Royal 


elande frém Тл ls Astr i 

Observatorium, Series IT, No. 51. Ч "nds Astronomiska 

—— (1931), Applications à l'astronomie (one of the Series in Borel’s Traité du cal ld 
Probabilités, Gauthier-Villars, Paris). коше 


Comrie, L. J. (1939), Tables of tants and log (1 + 28); Cambridge University Press, 

Cornish, E. А., and Fisher R. A. (1937), “ Moments and cumulants in the Specification 
of distributions,” Revue de l'Inst. Int. Stat., 5, 307. 

Cramér, H. (1926), ** On some classes of series used in mathematical s 
Skandinaviske Matematikercongres, Copenhagen. 


—— (1928), “Ол the composition of elementary errors,’ 
13 and 141, 


Edgeworth, Е. Y, (1904), “ The Law of Error," Cambridge Phil. 
an appendix issued wit] 


2 bound reprints). 
Elderton, Sir W. P. (1938), Frequency Curves and Correlation, 
University Press. 


Fisher, Arne (1922), Frequency Curves, Macmillan, 
Henderson, J. (1922), ** On expansions in 
Jordan, C. (1927), Statistique Mathématique, Gauthier-Villars, Paris 
Jørgensen, N. R. (1916), Undersøgelser over Fre Г, relation, Busai 
Pearson, K. (1925), “ The fifteen constant bivari urface,” Ерте! in 
Pretorius, S. J. (1930), ** Skew bivariate freque mined in {бе ы „17,2 t і 
illustrations,” Biometrika, 22, 109. ght of numerica; 
Romanovsky, V. (1924), “ Generalisation of в 
Pearson,” Biometrika, 16, 106. 


tatistics,” Den Sjette 
” Skandinavisk Aktuarietidskyify, 
Trans., 20, 36, 113 (and 


3rd edition, Cambridge 


Biometrika, 14, 157, 


х7 
( 


2 


EXERCISES 161 


Steffensen, J. (1930), Some Recent Researches in the Theory of Statistics and Actuarial Science, 
Cambridge University Press. 
Wishart, J. (1926), “ On Romanovsky's generalised frequency curves," Biometrika, 18, 221. 


EXERCISES 


6.1. Show that for the Pearson distributions 
dlogy _ x 
dx В, + Bix + Bax? 
the range is unlimited in both directions if B, + B,v + Bx? has no real roots; limited 


in one direction if the roots are real and of the same sign ; and limited in both directions 
if the roots are real and of opposite sign. 


6.2. Show that the Pearson Type VI curve may be written 
r2 —m -1z 
y- {1 Ж =) e" tanh ^T 


and discuss the relationship with the Type IV curve. 


6.3. Assign the following distributions to one of Pearson's types:— 


К 
dF = ke 254"! dy 
k dt 


zi 
y 
dF = 1 — r?) dr 


x 
dF = kyP-3(1 — 2) ? dn 
(All these distributions are important in the theory of sampling.) 


dF = 


6.4. Show that for the Type B series the coefficients of equation (6.63) may be written 
Eg: 


Т g j 
am jin F (Drie ym + (s jiy- am? . } 


(C. Jordan, 1927.) 


6.5. Show that, in the notation of 6. ey 


A y(m, 2) = — x. v(m, 2 + 1), 
Li 
Hence that Ху, x) —1 —L,0 + 1), 
T=0 


A.S.—VOL. 1. M 


162 STANDARD DISTRIBUTIONS ` j 

where In is the incomplete J-function ; and hence that the sum of the first (A + 1) terms | 

of the Type B series is given by | H 
b, — 1,0 + 1)) — (bi + 0,6, + . . )у(т, 2) inst. 

= (С. Jordan, 1927.) | 


6.6. Show that if y is a function of x which it is desired to represent approximately | 
by the form 


y= > Hj (e)ala), 
7=0 
then the values of the c's appropriate to the expansion of y in this form are such as to 
minimise the sum 


| y— Уенда) sei 


j=0 


Valz) 


| 
URS GOMA Tif (a+ 2) dx \ 
6.7. Show that for a Pearson distribution — = the characteristic E 


f 6 +02 + 0,2? 

function obeys the relation 
602? + (1 + 2 +60) 4 a+b + 0,0) = 0, 
2 192 2 1 do 1 0' — Vs 


where 0 = it. Deduce the recurrence relation between the moments. 
Show also that the cumulative function obeys the relation 


а? ау\2 а 
vole + (25) } + (1+ 20, + 050 + (a + bs + bab) = 0, 


Hence show that the cumulants obey the recurrence relations 


{L + (r + 2)0944 + rbi, + us st yes 1 + ( " Dens a ocu 


А ES 
*E \ё j sissies 2566 +( 1 7) = 0. 


6.8. Show that no distribution which is not com 


pletely determined by its 
can be expanded in a convergent Type A series. : B 


6.9. If the distribution 


1 А 
тЫ dz: — 0 <= «o 


-d | 
is transformed by 2 = 709810 &—1 | 


1 
апа Б = 10810, А = Фе, 


show that Pr АА з) 4 | 
Bs 3 = 252° + 92 4 3) _ 6 


and that pi)? — 343l)? — ud = 0, 


EXERCISES 163 


where иу is the first moment about the start of the transformed curve. Thence that 
l = 2 log ш — 4 log (и: + 15?) 
bk? = 2 log иу — 21. 


6.10. Show that if a function in standard measure is expanded in a Type A series 
the coefficients of the second and third terms depend respectively on f, and f, and thus 
provide measures of skewness and kurtosis. 


6.11. If f(x) and g(x) are two differentiable frequency functions with cumulants 
denoted respectively by к and к’ show that formally 


f(x) = exp by = 7 s(- чь) joe 


J=0 


CHAPTER 7 
PROBABILITY AND LIKELIHOOD 


ir parameters and distributions when 
not possible to make any statements on these т 


t is necessary if scientific inquiry is to go 
finite nature in terms of probability. In 
vith the theory of probability as it affects 
applications in Statistical theory. 


probability ”, * chance » or “ likeli- 
me proposition of whose truth we are 


that if a penny is tossed ten times it will come do s 
to rain to-morrow, and so on. Itisrarely i 
with a proposition of whose truth we are 
to assume that such propositi 
in a rational way. 

The attitude of doubt we adopt is descri 
the propositions are more or less probable 


hood ” to describe an attitude of mind towards sor 


bed in terms (9) 


f probability. We say that 
and accept or reje 


ct them accordingly, 


115 wet weather in the 
But it doe. 
that every p 


d that, Whereas we m. 


SO great as to be near certainty. 
probability do not admit) 

сап be compared. Jt could with 

the probability of getting ten trumps in a game of cards wi 

is no way of comparing the probabiliti 

outside the orbit of Pluto and that 


PROBABILITY AND LIKELIHOOD 165 


(c) The degree of probability attributed to a proposition varies according to the amount 
of relevant evidence available to the particular mind considering the proposition. If we 


` ~= * know that a horse has won its three previous races we attach a greater probability to the 


TPA 


e 


proposition that it will win the next. If we know that a penny has heads on both sides 
the probability that it will come down heads when tossed is so great as to amount to 
certainty ; and so on. 

(d) Pursuing this last point, we see that certainty can be regarded as a limiting form 
of probability. Asa proposition becomes more and more probable it tends towards certain 
truth ; as it becomes more and more improbable it tends towards certain untruth. 


7.4. The object of the theory of probability is to give to the somewhat indefinite 
notions described above the precision of a science, and, since numerical measurement is 
the greatest precision which a science can possess, to measure probability numerically. 
Several writers have explored the more general problem, foreshadowed as early as Leibniz, 
of developing a logie of probabilities, and the reader who is interested may refer to the 
work of Keynes (1921), F. P. Ramsey (1931) and Johnson (1921-4). From the statistical 
viewpoint the interest of this subject centres in the numerical theory of probability which 
alone will concern us in this book. 

It is at this point that we arrive at the first of the differences of opinion among 
authorities on the theory of probability. Some writers try to include all the ideas generally 
associated with the word “ probability " within the Scope of their theory, which is thus 
applicable to any of the attitudes of doubt covered by the meaning of the word. The 
principal modern exponent of this viewpoint is Jeffreys, whose book (1939) should certainly 
be read by -all serious students of the subject. Most statisticians, on the other hand, are 
Concerned with the probabilities of propositions of a particular kind, namely, those which 
form the members of populations of propositions. Under the more general theory, it has 
meaning to speak of the probability of an isolated proposition such as the one that Shake- 
Speare’s plays were written by Francis Bacon. In statistics we are more usually concerned 
with the proposition which asserts the happening of some event which could have arisen 
in a specified number of ways, such as the throwing of a number with an ordinary die. 
The firstapproach takes probability to be an undefined idea, like the straight line of Euclidean 

“geometry, and builds up the theory from certain axioms. The second approach seeks to 

define probability in terms of the relative frequency of events and thus to throw the theory 
back on to the pure mathematics of abstract ensembles (Kolmogoroff, 1933) or to the 
limiting properties of sequences (von Mises, 1936). The reader who is perplexed by the 
controversy between the adherents of the axiomatic and the frequency theories will find 
many of his difficulties resolved by the consideration that the two theories cover different 
domains of thought, or rather, that the axiomatic theory attempts to cover a wider domain 
than the frequency theory. 


7.5. This, however, does not explain away the whole of the difficulty, and the reader 
will have to choose for himself among the various possible sets of fundamental ideas forming 
the starting-point of the theory. When we consider the concept of probability as a psycho- 
logical matter we can either suppose that further analysis is impossible or unprofitable, 
In which case the axiomatic approach seems inevitable ; or we can ask how the mind comes 
to take up an attitude of belief in propositions which confront it. It is not necessary here 
to go into this question at length, but there would, in my own opinion, be a considerable 
measure of agreement that the concept of probability is founded on our experience of the 


166 PROBABILITY AND LIKELIHOOD 


frequency of observed phenomena. When we say that the probability of a coin coming 
down heads on being tossed is one-half we have in mind, I think, that if it is tossed a large 
number of times it will come down heads in approximately half the cases. Even in extreme 
Cases, say, when we attempt to assess the probability of a horse winning a given race, an 
event which cannot be repeated, we are, I think, picturing our estimation as one of a number 
of similar acts and assessing the relative frequency of the horse’s victory in that population. 

But it has to be admitted that, even if this be true, there is no necessity to use the 
concept of frequency in the axiomatisation of the theory. The concept of a straight line 
may very well be founded on our experience of the local properties of rays of light, but it 


does not follow that the indefinables of Euclidean geometry are to be analysed into optical 
concepts, _ 


The Basic Rules of Direct Probability 


7.6. For our present purposes the problems of fundamentals may be passed over, 
since all parties are agreed on the rules governing the calculus of direct probabilities. (The 
so-called “ inverse ” probabilities will require more discussion and will be dealt with later.) 

_ We therefore enunciate these rules without attempting to deduce them from more primitive 
propositions, 

In the first place it is assumed that probability is measurable on a conti 
so that any probability can be expressed as a real number. 
probability is x, a real number. This assumption implies, 


nuous scale, 
We shall, in fact, say that a 


7.7. The probability 


of a proposition q on data p is written P(g | p). 
Rule 1:— 


We have then 
If p entails q, P(q| p) 21 
If р entails not-g, P(g| р) — 0 
This rule defines the end-points of our scale of probability. 


ls not true is represented by zero, certainty that it is true by 
in the range 0 to 1. 


А = (7.1) 
3 а (7.2) 
Certainty that a. proposition 


7.8. Rule 2.—If 91... 9. are a set of e 


qually probable and mutually exclusi 
propositions on data p, and if Q is a subset of m ЕГ" 


of these Propositions, then 

P en 

(@12) ==, 

This proposition is the starting-point of the frequency theory of probability. Tt į 
usually stated in some such form as: if of a set of n mutually exclusive and equall. m b blo 
events m are distinguished by some characteristic Na 


s A, the probability of an event bearing 
A is —, 


. (7.3) 


The objection to this rule from the logi 
“equally probable ” and is thus circular if one ado 
theorist dealing with probability, 
either by accepting the circular 
of sets of points. For example, such a definition 1 


unity. Any probability lies _ 


BASIC RULES OF DIRECT PROBABILITY ^o Gg 
n in number m are characterised by some quality A, the probability of any member bearing 
mA is, by definition, the number =. To take a more sophisticated line, we can regard the 


objects as points of a set, attach set-functions to them obeying certain axioms and postu- 
lates, and thus build up the theory of probability as a branch of the theory of set-functions. 
Any verification of the theory, any test whether it provides a reasonably accurate picture 
of the way things happen in the world, is referred to experimental physics. The mathe- 
matician, of course, is used to this devolution of responsibilities, but the statistician is 
concerned with concordance between theory and practice and cannot always leave experi- 
mental verification to others. 


7.9. Rule 3.—1f the probabilities of n mutually exclusive propositions qı . . . d 
on data p are P, ... Pa then the probability on data p that one of them is true is 
p Че Pe. cR н 

This is generally known as the “ theorem of the addition of probabilities”. In the 
language of the textbooks, the probability that one of n mutually exclusive events will 
happen is the sum of their separate probabilities. 


7.10. Rule 4.—The probability of two propositions q and r on data p is the product 
of the probability of q given p and that of r given q and p. Symbolically, 


Plar |р) =Pa@|p)P(r|qp)." + -+ s: + 09 
Since q and r appear symmetrically we also have 
P(qr | р) = P(r | p)P(4 | rp). О S s чоксо) 


From the frequency standpoint this rule is almost self-evident. If of a set т, (а) bear 
the characteristic A, (b) the characteristic B, and (ab) both characteristics, then the rule 
states that . 


E т y (а) 


a simple arithmetical proposition. 
More generally we have 
Plaids - . - de |P) = Р(а | Р)Р(9 | Р)Р(аз | аар)... Pe | dii + +» UP) (7.6) 
a result which follows from the repeated application of Rule 4. 
If, as a particular case, 
Р@ |р) = PUI pPI - - - «+ (4) 
we have, in virtue of (7.4), 
P|) = Perla E =e eS a ES) 
and q is then said to be irrelevant to r, given p. A knowledge of g does not affect the 
probability of 7 on data p. 


y 7.11. The above four rules and various elaborations of them provide the basis of 
the direct theory of probability, which is concerned with problems of the type: given a set 
of propositions with known probabilities, determine the probability of some contingent 
proposition. "This is a branch of pure mathematics and will be found discussed, for example, 
in most textbooks of algebra. Ultimately all problems in this branch of the theory are 
reducible to the counting of the number of ways in which certain events can happen. The 
following examples will illustrate the type of investigation involved. 


168  ' PROBABILITY AND LIKELIHOOD 
Example 7.1 


What is the probability that a specified player will get a hand containing 13 cards 
of one suit at a single deal at a game of bridge ? 


We have to consider here the total number of ways in which a given player can be 
dealt a hand of cards. There are 52 cards and 13 can be chosen from them in B ways. 
Of these ways only four will contain cards of one suit. 

We then assume that all the possible deals are equally probable and are thus able to 


2 
apply Rule 2. Herem = 4 and n = (5). so that the probability is 


13 
4 
x uiro 
(5) 
_ 4.39! 13! 
EE 


Factorial expressions of this kind may be found from tabled log 


arithms of factorials or 
by the use of the Stirling approximation. In this particular са 


se we find 
P=6 х 10-12 approximately, 


Example 7.2 


n letters, to each of which corresponds an envelo 
random. What is the probability that no letter is pla 


The condition that the lette 


Un = (n — Yu, +, s). 
We may re-write this $ vae wa) 
: Un — T 1 = (ur ==] 
and putting а =U, — nu n~a) 
n—1 
we find = UA 
= (— Ту еу. 
Thus Un — Nyy = (— If-*(u, — 2 ш). 


But u, = 0 and ùz = 1 and thus 


Diei у E (— 1)” 


whence 


А 


RA > 


BASIC RULES OF DIRECT PROBABILITY 169 


The total number of possible ways is n! Thus the probability required із 
i (— 1)" 
a e ae 


n! 
ie. the first (n — 1) terms of e~t, 


Example 7.3 

Three pennies are tossed. What is the probability that they fall either all heads or 
all tails ? 

We assume that the probability of a head with any penny is } and that the result 
with one penny is independent of that with the others. Then there are eight possible 
and equiprobable cases, HHH, HAT, HTH, HTT, THH, THT, TTH, TTT. Two of 
these give us all heads or all tails and hence the required probability is 1. 

Now consider this argument: there are two possibilities, either the three coins all fall 
alike or two of them are alike and the other different. Of these two possibilities one is 
of the type required and therefore the probability is 3. 

Consider also this argument: there are four possibilities, three heads, two heads and 
a tail, two tails and a head, three tails. Two of these four are of the type required and 
therefore the probability is 3. 

Finally, consider this argument: of the three coins two must fall alike. The other 
must either be the same as these two or different. Thus there are two possibilities and 
again the chance is 1. 

These three arguments are fallacious. They assume equiprobability among events 
which are not equiprobable and the арріе оп of Rule 2 is not legitimate. For example, 
in the first case, it is true that there are two possibilities, but they are not equal under 
our assumptions. The reader may care to examine why this is so and how the other two 
arguments break down on the same point. 


Example 7.4 

Peter and Paul play a game with two dice. Peter plays first by throwing the dice 
together. If the total number of points is a prime number other than 2 he wins outright ; 
if it is even he throws again under the same conditions ; in other cases the throw passes 
to Faul, who throws under the same conditions. What is the probability of Peter's winning ? 

It is to be assumed that the probabilities of throwing any number 1 to 6 with either 
die are equal. The possible throws are 2, 3, 4 . . . 12 and the number of ways in which 
they can occur are :— i 


Total points . . . 2 3 4 
No. of ways D . s d 2 3 4 5 6 5 4 3 2 


“ 
© 
~ 
© 
© 
- 
e 
= 
=) 

= 

н t2 

H 
© 
a 
Б 
Р. 


Thus, according to Rule 2, the probability (1) of throwing a prime other than 2 is 34, 

(2) of throwing an even number is 1%, (3) of throwing neither is 35. 
. These three events are mutually exclusive. Let P be the probability of Peter's win- 

ning. Now if Peter throws a prime other than 2 he wins outright, and the probability 

of his doing so is thus 14; if he throws an even number he throws again, and his proba- 
M PE $ . ae) a 

bility of winning in this case (according to Rule 4) is 363 if he throws neither the throw 

passes to Paul, whose chance is then P, so that Peter's chance of winning is (1 — P). 


170 PROBABILITY AND LIKELIHOOD 
Thus, according to Rule 3, we have 
14 18P 4 
=z; t= + (1 —P),; 
36 ^ 36 ^ 38 ) 
mi 
P 18 
= 55: 


7.12. It is possible to carry mathematical problems on the foregoing lines to great 


lengths, and a considerable amount of ingenuity has been expended in doing so. The 
important thing to note from the point of view of the theory of probability is that in all 
such cases certain probabilities are stated a priori, either explicitly or implicitly in some 
such form as “ the dice are perfect " or “ the selection is made at random”. One of the 
most formidable problems of statistics is that only in exceptional cases is there any prior 
certainty about the probabilities of observed events. 


Probability in a Continuum 


7.13. Up to this point we have considered only probabilities of finite and discrete 


events ; but we may also ask whether any meaning can be attached to probabilities in 


acontinuum. For example, if a square is inscribed in a circle, what is the probability that 
a point taken at random in the circle is also inside the square? If a line is divided into 
three segments, what is the probability that they can form a triangle? What is the proba- 
bility that z < z, where z is a positive real number less than уо? And so on. 

All probabilities of this kind must be considered as limits. 
that of the square inscribed in the circle. 
cells of area ғ by a rectangular mesh. If we 


of the circle as 


we please by taking e smallenough. We hat ratio, which 


may say that the probability is t 
2 
is easily seen to be = an incommensurable number. 


Example 7.5 ; 
Consider a straight line OA bisected at В. What is the probability that i 
at random on the line falls into the segment ОВ? р Dt Point chosen 


Let us suppose in the first place that the line is divided into n e 
0A 


qual segments of length 
If we interpret the choosing of a point at random to mean tl 


intervals, the probabilit ill be h р 
in the segment OB. © half the intervals 


Now let OP be drawn perpendicular to OA and equal to it in len, 
of n + 1 lines drawn through Р, including OP and PA, so as to divid 


1e choice of one of these 
y is obviously 2 as n —> оо, for there w 


gth, and imagine a star 
е the angle opa( = B 
4 


THE VON MISES APPROACH 171 


: л : + 
into equal angles án These lines cut off segments on OA, and we may, if we regard 
4 


equal angles as having equal probability, assign to these segments an equal probability, . 
for they subtend equal angles at P. If we make this convention it is evident that as 
n —> co the probability of a point falling into any segment оп OA is proportional to the 
angle subtended at P. For example, the probability that a point falls in the segment OB 


: л 
is tan71} T 


Now this is not the same answer that we got by assuming all small segments of OA 
equally probable. There is nothing paradoxical in this—the two answers are different 
because the two limiting processes were different. On a little reflection it will be clear 
that by moving the point P on the perpendicular to OA and taking a star of lines as before 
we can make the probability of obtaining a point in OB have any value we like. It is 
thus abundantly clear that the concept of probability in a continuum depends on the limit- 
ing process by which that continuum is reached from a finite subdivision of equiprobable 
intervals, 


7.14. We have spoken above of the selection of objects “at random », In the 
mathematical theory of probability it is customary to define randomness in terms of proba- 
bility itself. А member of a population is said to be chosen at random if it is chosen by 
a random method; and a random method is one which makes it equally probable that 
cach member of the population will be chosen. Randomness is extremely important in 
the theory of sampling and we shall consider it at some length in the next chapter. At 
this point it is sufficient to note that when we speak of random choice we really mean 
a method of selection which gives to certain propositions an equal probability and hence 
allows us to apply the calculus of probability а priori. The justification for this is, in the 
ultimate analysis, empirical. It is found in practice that there exist selective processes 
which educe members of a population in such a way that the constituent events may be 
regarded as equiprobable ; and the theory of sampling is largely concerned with samples 
generated by such processes. 

It may be noted that, for continuous probabilities, randomness is dependent on the 
process to the limit just as probability itself is. 


The Approach of von Mises 


7.15. Suppose now we have a population of objects, each of which bears one of 
a number of characteristics. To simplify the exposition we will suppose that there are 
two characteristics denoted by 0 and 1. Suppose we draw members from this population 
and replace each member after drawing. Then the process of continued selection will 
generate a series such, for example, as 


K = 01100100111010111100100 . . . . . . 09) 


Von Mises (1936) takes as the foundation of his theory of probability an infinite sequence 
of this kind, the Irregular Kollektiv, obeying the following laws :— 
. . (a) The proportions of 0’s in the first n terms tends to a limit as n —- оо. This limit 
is called the probability of the zero in the Kollektiv. 

(b) If a subsequence is picked out of the Kollektiv by some method which is inde- 
pendent of the Kollektiv itself (e.g. every third member, every member whose ordinal is 


172 PROBABILITY AND LIKELIHOOD 


а square, every member following a zero, etc.), the limit of zeros also tends to pforn—- o; 
and this for every such subsequence. 

The Irregular Kollektiv might, in fact, be described as the infinite random series. Tt 
has no systematic qualities; for if, for example, the series consisted of repetitions of 
0110, thus 

K = 011001100110... 5 ә * + (7.10) 


the subsequence consisting of every (4r + 3)th would consist entirely 


of unities and the 
condition (b) would be violated. 


7.16. Ttis not difficult to show that probability defined in this way obeys the four rules 
enunciated earlier in this chapter. Some authorities have, however, found difficulty in 
accepting the basic concept of the Irregular Kollektiv and attributing any meaning to its 
existence. It has even been claimed that the idea is self-contradictory, though this von Mises 
strongly contests, 

However this may be, the von Mises approach represents, in my own opinion, the nearest 
to a satisfactory basis of the frequency theory of probability that has been given. The 

i t a e same in any of the frequency theories once the 


t when it comes to relating theory to experience 
the von Mises method has decided advantages. For a discussion of this subject, reference 


may be made to the works listed at the end of this chapter; in particular I have given (1941) 
w eliminates the difficulties associated with the 


Probability and. Statistical Distribution 


7.17. We now proceed to consider the relationship between the theory of probability 
and that of statistical distributions. Suppose we have a statistical population, finite and 
discontinuous, distributed according to a variate x. If we take a member at random from 
this population the probability that it bears an assigned variate-value Xp is t) 
function f(x), for this is the proportion of members bearing that value. 
probability that it bears a value less than or equal to z, is the distribution fu 
as follows at once from Rule 3 and the definition of the distribution func 
This is the essential link between probabilities and distributions, 
function gives the probability that a member of the population chosen at 
a specified value of the variate or less. We must, however, consider whet} 
can still be regarded as true for populations which are infinite or со 
Suppose in the first instance that the population is infinite and disco 
а case we cannot select a member at random, but we m 
a selection from a finite population which tends to the in 
this finite population the proportion of members with 
will be F(x) and thus, with due regard to the nature of the li 


nction F(a) 
tion. 

The distribution 
random will bear 
ner this statement 
ntinuous, 


, 


In such 
» imagine 
ideration, In 
ual to some y 


AF = f(x) Ax. 
If a member is chosen at random from th 
are equally probable, the probability that 
may say that the probability of obtaining 


is population in such a way th 
it falls in the range Дх is f(x) Д 
a value less than or equal to x, 


at equal ranges Ax 
x. In the limit we 
in taking a member 


] Wee M 2 ы. 
ee —-—-—M—- RC Т 


SAMPLING DISTRIBUTIONS 173 


To 
at random from a continuous population is | dF = F(w,. It must, however, be remem- 


bered that the nature of the process to the limit should be specified. 

Hereafter, in speaking of selecting a member at random from a population dF = f(x) dx 
we shall assume that what is meant is a selection random in the limit for intervals dz, i.e. 
such that intervals da are equally probable. 


The Concept of Random. Variable 

7.18. The idea of a variable z which can appear with varying degrees of probability 
dF = f(x) dx has been elevated by mathematicians into a distinct concept, that of a random 
variable. In ordinary analysis no such idea appears. We write “a variable ж” meaning 
that we are considering propositions about numbers which may be any of a certain range ; 
there is no thought that one of these values is to be considered more frequently than others 
or that it will occur more frequently in practice. The random variable, on the other hand, 
is to be regarded as defined by a distribution function. It may take any values in a given 
range, but the values are distinguished by an associated function. 


7.19. Let us consider what is meant by the addition of random variables. In ordinary 

analysis, given two variables z and y, we may define a third variable 
=#-=-Р%, 

which merely means that when ж = 20 and y = Yo, 2 Will be zy + Yo: Tf x and y are random 
variables, can we attach any useful meaning to 2? 

If the joint distribution function of x and y is Fis we have that the frequency of < x, 
and y < y, is Fie (vo Yo). Consider some value z,. We may then determine from 7; the 
frequency such that « +y «zo which will, in fact, be the integral 


| Jars (x, 9) 


taken over the region for which ~ + y «ze 

This integral defines a function of z which is in fact a distribution function, for it is zero 
at — оо, non-decreasing, and unity at + оо. We may then define this as the distribution 
function of the random variable z and say that z is the sum of the random variables x and y. 


7.20. More generally, suppose we have n random variables distributed in the multi- 
variate form dF(v, . .. &,). We may then define a random variable z by a functional 
equation 


Bis Ra eue). : & . a « (ell) 
"Tho distribution function of z, is the integral of dF(a, . . . %,) over all values of z, . . . v, 
such that 20 > 2(v1 . . . %,). We may regard the equation (7.11) as defining a new random 


variable z with this as its distribution function. 


Sampling Distributions 

7.21. We have noted that if a member of a population is chosen at random, the 
probability that it will bear a variate-value not greater than x is the distribution function 
F(x), Similarly, if we choose a member from a multivariate population, the probability 
that it will bear a value of the first variate not greater than 2,, of the second not greater 
than z, ... of the nth not greater than tp, is the multivariate distribution function 


174 PROBABILITY AND LIKELIHOOD | 


F(a, x... х„). Further, if the variates are independent, as defined in 1.33, the rth 
variate being distributed as аР, (2,), this probability is equal to 
(ауа) COR 

Now suppose that we have a selective process, which we will call sampling, applied to 
a univariate population in such а way that it abstracts a group ofn members. If this process 
is repeated it will generate a multivariate distribution, each sample exhibiting n values 
a, ... ж. The nature of this multivariate distribution depends on the sampling process 
as well as the population. If the distribution is G(v, . . . x,), then this function represents 
the probability that a random sample will result in z values, the first not greater than a, 
the second not greater than x., and so on. 

"There is one type of sampling process of outstanding importance in statistical theory, 
namely that in which the distribution G(x, . . . z,) is the product of factors G,(2,), 
Galta) ... G,(x,) In such a case the sampling is said to be simple. The distributions of 
the values z, . . . х, are independent one of another, and we may thus say that the selection 
of any member is independent of that of any other. Moreover, if the sampling is random, 
every G(x) will be equal to F(x), the distribution function of the population. Thus in this 
case we have, for the distribution of the variate-values in samples of n obtained by a simple 
random method, 

dF(z, .. . %,) = dF (x,) dF (x) . . . dE (xp) 
=.) fe) . . « flay) dx, da... o dz, ө ‚ (7.12) 
апа F(a,) F(x.) . . . F(v,)is the probability that in such a sample the first value will not 
exceed z,, and so on. Moreover, since the z's appear symmetrically in (7.12) their order is 


not material. The equation gives the probability that one member of the sample will not 
exceed z,, another z,, and so on. 


7.22. Suppose now we have a sample of n members of the population with variate- 
values w, . . . z,. We may construct from these values some function, say 


gx...) 


which might, for example, be the mean or the variance. We may then ask: 
hypotheses as to the way in which the sample was derived, what is the probability that z i 
not greater than some assigned value z,? In terms of frequency, if all possible sam te 
&,...%, were drawn and z computed for each of them, what proportion would fail a 
exceed some value z, ? T 


As an illustration, suppose we draw a sample of two from the normal population 


+ (743) 
on certain 


1 z 
F -———— e 2 dg. 
(5л) 20° dx, 


Let the sampling be simple and random. Then in virtue of (7.12) the probabili ] 
the ranges centred at x, and 2, is Probability of values in 


1 1 К 
dP = TEF exp za + ín bas das. à ч e (7.14) 


Consider now the quantity 6 
= Tı + Ta 
Eo 


What is the probability that z shall be not greater than some assigned z, ? 
the integral of dP in equation (7.14) over the region such that 


2 


E(t, + aa) < 20, 1.6. 


It is seen to be - 


BAYES’ THEOREM 175 


ik (> pu We 
Plz <z) = zl f exp {- zi + 28) pda, day. 


Write 


The integral becomes 


ECT S (-5)а (7.15) 
(С 


P(z, — М, <2 «2, + А420) = ва qi . + (7.16) 


a result which, remembering the relation between probability and the distribution function, 
we may express by saying that z is distributed normally with variance 30°. The distribution 
function of the statistic z is given by (7.15) and its frequency function by (7.16). 


Thus 


7.23. In the more general case of a statistic z = 2(01 . . . т„) we see that the prob- 
ability of z < z, is obtained by integrating the joint distribution of z, . . . v, over the 
domain of z^s such that zo > z(®ı . . . v,) This gives us the distribution function of the 
random variable z defined in terms of the random variables x, . . . =, by the equation 
2 —z(x,...m,) We shall develop this subject systematically in Chapter 10. 

When the z-values are chosen by a simple random process the distribution of z is called 
a simple random sampling distribution, or more shortly a sampling distribution. Unless 
otherwise specified the words “ sampling distribution ” are always to be taken to refer to 
sampling under simple random conditions. 


Bayes’ Theorem 


7.24. We now revert to the theory of probability. Suppose that qı . . . 9, are 
alternative propositions and let H be the information available, p some additional informa- 
tion. Then by Rule 4 


P(y,p | Н) = P(p | Н) Pi, | pH) 
= Py, | H) P(p | q,H) 


whenco 
Pia, | pH) | P(p | 4H), 
P| H) .Р|Н) 
Thus 
Piq, | H) P(p | gH) 
P(q, | pH DE a ə c ДЕА 
(a, | pH) Pio H) (7.17) 
Since the truth of one of the q's is certain we have, summing for all g's, 
— p Pla | H) Pip | gH) 
1-» РФ ГИ) - Od ih. o (gels) 


176 PROBABILITY AND LIKELIHOOD 


whence, from (7.17), 


— P, | H).P(p | g,H) 
P(q, | pH) ЕРЕН Роан OON тта) 


ог, for variations in q,, 

Piq, | pH) cc Piq, | H) Pip | g,H) _. 2E (7.20) 
This is Bayes’ Theorem. It states that the probability of g, on data p and H is proportional 
to the product of that of q, оп H and p on g, and H. 

The principal application of the theorem lies in reasoning from observed events to the 
hypothesis which may explain them. The theory of this subject is accordingly known as that 
of “ inverse " probability. Suppose, in fact, that an event can be explained on the mutually 
exclusive hypotheses q, . . . g, and let H be the data known before the event happens, so 
that H isthe basis on which we first judge the relative probabilities of the qs. Now suppose 
the event to happen. Then Bayes’ theorem states that the probability of д, after it has 
happened (i.e. on data H and p) varies as the probability before it happened multiplied by the 
probability that it happens on data q, and Н. The probability P(q, | pH) is therefore called 
the posterior probability, P(q, | H) the prior probability, and P(p | 4.Н) will be called the 
likelihood. 


In this book the word “likelihood ” will be used solely in this special sense, 


7.25. The practical use of Bayes’ theorem depends on a knowledge of the prior 
probabilities. When they are known we can calculate and compare the posterior prob- 
abilities of the hypotheses, and if we have to choose one in preference to others we choose the 
one with the greatest posterior probability. But we are rarely, if ever, given the prior 
probabilities. And this brings us to what is perhaps the most contentious point in the 
modern theory of probability. 

Bayes stated (though he appears to have felt more hesitatior 
that if there was no known reason for supposing that the prior 
they were to be assumed equal. This is Bayes’ postulate, whic 
the theorem of (7.19). It immediately resolves the difficult 
before discussing the postulate and describing other appro: 
useful to give two examples of the use of the postulate 


Example 7.6 


An urn contains four balls, which are known to be either 
and two black. A ball is drawn at rand 
bility that all the balls are white ? 


n than most of his followers) 
probabilities were different, 
h is to be distinguished from 
у of applying the theorem, and 
aches to the matter, it may be 
In practical problems, 


(a) all white, or (b) two white 
hite. What is the proba- 


We have here two hypotheses, Ф. and ga. On q, the probability of getting a white рај] 
is 1, on q, it is 1. From (7.19) we have 
Piq, |Н) 
P Н) = a ы 
IPE) = ра + PGA) 
1 
Рд. | pH) = 22019) 


P(q | Н) + AP(g, | Hy 
Now, in accordance with Bayes’ postulate we assume 
P(q | E) = Рф. |Н) =} 
and find 
Р( | pH) = 
P(q, | pH) = 


cal colts 


an. . MEE S c 2. 


BAYES’ THEOREM 177 


We are thus led to prefer the hypothesis q, that all the balls are white; since this has the 
greater posterior probability. 


Example 7.7 

From an urn full of balls of unknown colour a ball is drawn at random and replaced. 
The process is continued m times and a black ball is drawn each time. What is the prob- 
ability that if a further ball is drawn it will be black ? 

The question as framed does not admit of a definite answer, for, there being an infinite 
number of possible colours and combinations of colours, we do not know what are the 
hypotheses which are to be compared. Let us suppose that the balls are either black or 
white, and thus consider the hypotheses (1) that all are black, (2) that all but one are black, 
(3) that all but two are black, and so on. The problem still lacks precision, for the number 
of balls is not specified. Suppose there are N balls. We shall later let N tend to infinity to 
get the limiting case. 

Consider the hypothesis that there are R black balls and N—R white ones. The prob- 


ability of choosing a black ball is ў and that of doing so m times in succession, in virtue of 


Rule 4, is PE If the g’s have equal prior probabilities we have, from (7.19), 


Now the probability of getting a further black ball on hypothesis q, is а Since the 


hypotheses g are mutually exclusive, the probability of getting a further black ball is, in 
virtue of Rules 3 and 4, 
м 


Ё\т+1 
: A 
er R=0 


R 
D year | PH) = 53 = 


m 
= 2 (s) 
R-0 


This is the answer to the limited form of the question. Аз N —> оо this tends to the quotient 
of definite integrals 


|; qnl da 
0 т 1 
| а" da m + 2 
0 


This is a particular case of the so-called Succession Rule of Laplace. Enthusiasts have 
applied it indiscriminately in some such unconditioned form as the statement that if an event 
is observed to happen m times in succession the chances are m + 1 to 1 that it will happen 
again. This is clearly unjustified. 


7.26. The principal difficulties arising out of Bayes’ postulate appear from the stand- 


point of the frequency theory of probability. Ifwe adopt the axiomatic approach, in which 
A.S,—VOL, I N 


178 PROBABILITY AND LIKELIHOOD 


probability is a measure of attitudes of mind, it is reasonable to take prior probabilities to be 
equal when nothing is known to the contrary, for the mind holds them in equal doubt. The 
frequency theory, however, would require the states of events corresponding to the various 
q's to be distributed with equal frequency in some population from which the actual g has 
emanated, if Bayes' postulate is to be applied. "This has appeared to some statisticians, 
though not to all, to be asking too much of the universe. The postulate is one of the crucial 
points in the theory of probability. Adherents of the axiomatic school accept it. Many of 
those of the frequency school explicitly reject it. 

There is still so much disagreement on this subject that one cannot put forward any set 
of viewpoints as orthodox. One thing, however, is clear—anyone who rejects Bayes’ 
postulate must put something in its place. The problem which Bayes attempted to solve is 
supremely important in scientific inference and it scarcely seems possible to have any 
scientific thought at all without some solution, however intuitive and however empirical, to 
the problem. We are constantly compelled to assess the degree of credence to be accorded 
to hypotheses on given data; the struggle for existence, in Thiele’s phrase, compels us to 
consult the oracles. 


The Principle of Maximum Likelihood 


7.27. The school of statisticians which rejects Bayes’ postulate has substituted for it 


an apparently different principle based on the use of likelihood. Reverting to equation (7.19) 
we see that for any g, and H 


P(q, | pH) cc P(g, | Н) L(p | q,H), + (7.21) 


where we now write L(p|q,H) for the likelihood function. The Principle, of Maximum 
Likelihood states that when confronted with a choice of hypotheses д we are to select that one 
(if it exists) which maximises L(p|q,H). In other words, we are to choose the hypothesis 
which gives the greatest probability to the observed event. 

It is to be particularly noted that this is not the same thing as choosing the hypothesis 
with the greatest probability. In fact, some adherents of the frequeney theory of probabilit 
deny any meaning to the expression “ probability of a hypothesis ^, and the principle 2 
maximum likelihood was introduced largely to replace the notion of « inverse ^ probability 
which leads to the use of such a phrase, 


7.28. Suppose (as is nearly always the case in statistical work 
with which we are concerned assert something about the numerical v. 
In such a case we shall speak of a statistical hypothesis. For instance 
be qı = 8 <0, qa = 0 > 0, in which case there are two alternative: 
а =9=1, q, — 0 = 2, and so on, in which case there is a d 
hypotheses, ` 

If now 0 can have only discontinuous values, we may, 
event p, require to estimate 0, or to ask what is the “ best ^ 
р. The method of Bayes would state that the “ best ” 
In (7.21) we should seek for that g, Which made P(q, | pH) a maximum, : 
of the prior probabilities P(q, | H) we should, in ae with Boe К nothing 
all such probabilities equal. Wethen merely have to find that q, Which maximise d assume 
Tn other wards, the postulate of Bayes and the principle of maximum likelihood SUE | 2-H). 
same answer and are equivalent. 90€ result in the 


) that the hypotheses 
alue of a parameter 6, 
› the hypotheses might 
S. Or we might have 
enumerable infinity of 


THE PRINCIPLE OF MAXIMUM LIKELIHOOD 179 


7.29. This position apparently does not hold if the permissible values of 0 are continu- 
ous. We must now replace such expressions as P(g, | Н) by P(0,—1d0, < 0 <0, + 3446 | Ну ` 
and in place of (7.21) we get 
P(0, — 140% <O « 0, + 340, | pH) cc P(0, — 1d0, <0 <6, + 3d0,| Н) 

x L(p|0, — 210, «0 «0, + 240, Н). 4 . (7.22) 
If we now require the “ best ” value of 0, we should, in accordance with Bayes’ postulate, 
take the prior probability to be a constant and once again we should have to maximise L for 
variations of 0. 

We might, however, have chosen to represent our hypotheses, not by 0, but by some 
variate ¢ functionally related to б, e.g. the standard deviation instead of the variance. In 
this case we should have reached equation (7.22) with $ written everywhere instead of 0; 
we should have taken the prior probability as constant; and we should have arrived at 
the conclusion that we should maximise L for variations of 4. 

But are we being consistent in so doing ? If we assume that the elementary intervals 
of 0 are equiprobable we cannot assume the same of ¢, and thus the use of Bayes’ postulate · 
appears to involve self-contradiction. The principle of maximum likelihood is free from 
this difficulty, for if L(0) is to be maximised for variations of 0 it will, at the same time, be 


maximised for variations of ¢, since 
oL _ 2019$ 
0 09 00 
and the two sides of this equation vanish together. 


7.30. This is one of the grounds on which adherents of the frequency school have 
rejected Bayes’ postulate in favour of the principle of maximum likelihood ; but in my view 
the matter has been misunderstood. It would seem that Bayes' postulate and the principle 
give the same answer in the continuous case as well as in the discontinuous case when proper 
regard is had to the limiting processes involved. We saw in 7.13 that in speaking of 
probability in a continuum it was essential to specify the nature of the process to the limit. 
If we regard 0 (from the frequency viewpoint) as having emanated from a population by 
а process random in the limit for intervals d0, then Bayes’ postulate applied to this process 
will clearly give a different answer from that obtained by supposing that 0 emanated by 
a process random in the limit for d$ (- 2. a), The two are different just as the prob- 
abilities in Example 7.5 are different, and for the same reason. Thus the apparent incon- 
sistency is not an inconsistency at all, but a difficulty introduced by ignoring the limiting 
process in continuous populations.* 

For an extended discussion of this subject reference may be made to Kendall (1940). 
In the present volume it need not concern us to take it farther, though considerable use will 
be made of the principle of maximum likelihood in Volume 2. Tt will there be seen that the 
principle has many important statistical properties. No one, in fact, denies the importance 


* A further difficulty arises if 0 can lio in an infinite range, for then Bayes’ postulate apparently 
leads to the conclusion that prior probabilities in any finite range are zero and hence so are posterior 
Probabilities. This does not arise in the likelihood method. Looking at the problem generally, we 
need not be surprised that the difficulty appears since the ranging of 0 over an infinite range is also 
a limiting Process. In practice we are never so ignorant a priori as to suppose that 6 can be any value 
bis large with the same probability, and if we consider the range as determinate but unknown, 
ükelihood and Bayes’ postulate continue to be applicable and to give the same results. 


180 PROBABILITY AND LIKELIHOOD 


of the principle or its usefulness in certain cases ; the controversy hitherto has centred on 
the considerations by which the acceptance of the principle as a rule of conduct is to be justi- 
fied. The reader who cannot accept Bayes’ postulate and the foregoing argument that it is 
virtually identical with the principle has a choice of courses. He can accept the principle 
as a new and distinct postulate of scientific inference; he can regard it as justified by its 
mathematical and statistical properties ; or he can rely on a more sophisticated approach 
which will be touched on in Chapter 9, namely, that the principle leads to estimates of 
parameters with minimum sampling variance when such exist. At this stage he may be 
prepared to accept it on intuitive grounds.* 


7.31 Although in the remainder of the present volume Bayes’ postulate and the 
` principle of maximum likelihood will not often appear explicitly, we shall frequently use 
a type of argument which is, in the ultimate analysis, based on them. A certain event or 
series of events is observed ; ona hypothesis H the occurrence of these events is found to be 
highly improbable ; and therefore H is rejected in favour of some hypothesis which makes 
the observations more probable. To take a very simple example, we toss a penny twenty 
times and find that it comes down heads every time. If the penny were unbiased 
(hypothesis H) the odds against this event would be 220 — l to 1. Thus we reject H in 
favour of the hypothesis that it is in fact biased in favour of the heads. 
It will readily be seen that this type of argument is a somewhat indefinite form of the 
inverse type with which we have been concerned. The chief difference lies in the fact that 


it is used to reject unlikely hypotheses rather than to accept the most likely, possibly a safer 
but certainly a less precise procedure. 


The Central Limit Theorem 


7.32. То conclude this chapter we prove an important theorem which 
distribution a central place in the theory of probability 
already been shown that the distribution appears as th 
the Pearson Type III distribution when expressed in standard measure. We shall prove 
a much more general result, due to Laplace but first proved rigorously by Liapounoff, that 
under certain conditions the sum of n independent random variables distributed i 


ple but powerful result connected with th 
istic functions of sums of independent random variables. 


distributed as dF... . аР, the element of frequency 
integral of dF, . . . dF, through the element of volur 
characteristic function of their sum, bein g the integral о 


e character- 

If we have n such variables 

of their sum z = Ui «Бы s. m, 18 the 
ENG 


ne between z and z + dz. Thus the 
fe" through the range of z, is equal to 


| | edP, .. аР, 


эз © © 
= Í gites ar, | dis ар... f ей» др, 
ТРА 


* An approach of a rather different kind has been develo 
who bases his theory of inference only on direct probabilities, 
in the second volume. 


ped in recent years by Neyman (1937), 
An account of this theory will bo given 


a 


X 


THE CENTRAL LIMIT THEOREM 181 


That is to say, the characteristic function of the sum of a number of independent random 


variables is the product of their characteristic functions. The cumulative function is 


accordingly the sum of their cumulative functions. 
Now as to the Central Limit Theorem itself. We first of all outline the proof briefly 
and unrigorously to indicate its essential features, and then give a rigorous proof. Suppose 


we have distributions F, . . . F, all with finite second moments and with characteristic 
functions фу... n. We have for any F, 
(it) .(àt)* 


p(t) = 1 + far + дә 3! + Ё, 


when R is а remainder term. Similarly we have 


, (it it)? 
v) = шү + E +В, 


Hence the cumulative function of the sum of the independent variates will be 
Wt) = (t) + p + ХЕ. 
We can without loss of generality take the mean of the sum as origin, so that Eu, — 0, and 


є 7 
now transforming to standard measure by the transformation & = Ej we find 
epa 


— 12 XR 
ө = z^ + o( 21). 
2! X ug) 
Since Xu, is of order n the remainder term will be of order = ie. of order 1-3; and thus 


tends to zero. We shall then have 


lim W() = — 2 
к 
lim Фе) = € 


and hence in virtue of the converse of the First Limit Theorem (4.12) the distribution of 
the sum of the random variables tends to normality. 


7.33. The rigorous enunciation of the theorem and its proof are as follows :— 
If n independent random variables are distributed in the forms F, . . . Р, with finite 


n 


variances „у... Kon and ЛГ, = Хь then the sum of the variables divided by УМ, 


j=1 
tends to the normal form, provided that for any e> 0 


1 n 
lim, == а? dF, = 0. а z à * (7,2 
n—o М, рУ | Izl eM, $ UN 


The implications of this condition, which is a modification by Cramér of one due to 
Lindeberg, are not very obvious, but it involves that 


ММ, co and #i_, 9, s ә 3 . (7.24) 


n 
ìn other words, that the total variance tends to infinity but that the proportional contribution 
of each constituent tends to zero. То sce that (7.24) follows from (7.23) we note that if M„ 
does not tend to infinity it must, being an increasing function, tend to a constant. It would 


182 PROBABILITY AND LIKELIHOOD 


follow from (7.23) that the sum of the integrals, each of which is positive and not small for 


every e, would tend to zero, whichis impossible. Further, if £ did not tend to zero, then ` 


M, 
at least one of the terms in ME 23) would not do so, and thus "the sum would not do so. 


We have 
oo itr 
Mr) -f as 
itr 


= | + |] evM, dF}. 
jz|>eV Mg Izi&ev M, 


Expanding the exponential with a Maclaurin remainder we have 
, dla (it) 22 mu) 
i + 0:— — MF, 
S(r) = ESA ЕЕ 4M, 2M, j 


ilo (it)2x:2 , (tt) 3223 5 
e AN (Gea us 2M, Т лт Jar, USSU i 
We may without loss of ues suppose Te mean to be zero and hence we find 


= n Ш, 4 ea 
Sr) = Ё = 4 gy c adi 


ГА 
Аз атш Ж. 13 здү 0 01110" à 
6M. J \zi<evar, e ч Sere s 
Thus for some T> 1 we have, for | Ul eun remembering that 
i [аз aF «evar, | x? dP 
lr] ev M, Iz| ev M n 
1° Из, з 
= [2 А " М „ 
Cr Tw E AM, DUE. ar (ns; 2s jerk ur) soc 
Hence, in virtue of (7 (7.24) the coefficient of 9” i 18 as small as we please and thus «( x ) 
tends to unity as n —> со uniformly for |t| < T. Thus we have is 


vir. ) =(1+ vA) m. 1} 


for sufficiently large n and ly|<e. Thus for e < 3, 


t \_ ы, | 20°73 Р 
“(var ) = UI M. = = pn CUM ч), 


Summing for j we have, in virtue of (7.23), 


t Us 7 T 
(var) = —у— + 20"T*(e + vanishing quantity) 
and thus for | | < T 
E 
VM, МӘР, 
the convergence being uniform їп any finite ¢-interval. 
verse of the First Limit Theorem, 


lim ae 


The theorem follows from the con- 


NOTES AND REFERENCES 183 


7.34. The following comments will amplify the above proof. 

(а) The Lindeberg condition (7.23) is necessary as well as sufficient. A proof is given by 
Cramér (1937). 

(6) The condition may be put in other forms, for which see Cramér (1937), Uspensky 
(1937) and the original memoir by Liapounoff (1901). 

(c) The sum of random variables whose distributions have not a finite second moment 
may not tend to normality. It will be seen in Chapter 9 that the mean of n variables each of 
which is distributed in the form 

gp ОЕ o 
1+ 2? 
is also distributed in that form, however large n may be. 

(d) Liapounoff has also given some remarkable results showing how close the limiting 
form is to the sum of n variables. In fact, if F,, is the distribution function of the sum, 
and F that of the normal form 
log n 

ут’ 


where с is a constant, ps, is a function of the third moments of the constituent distributions. 


| F4, —F| <cpsn 


NOTES AND REFERENCES 


The logic of the theory of probability will be found dealt with in the books by Keynes 
(1921), F. P. Ramsey (1931) and Johnson (1921). All these take the axiomatic approach 
from probability as an undefined idea. The frequency approach has been discussed from the 
more logical angle by Venn (1888), whose book, though out of print and to some extent out of 
date, is still worth reading. 

The mathematical theory of probability has been treated by Lévy (1925), Jeffreys 
(1939) and Uspensky (1937), all three books excellent of their kind. Von Mises’ approach is 
described in his book (1936) and an axiomatisation in a paper by Dórge (1934). See also 
Kendall (1941). 

For inverse probability and likelihood see the review by Kendall (1940). There are 
scores of papers, mostly controversial in character, on this subject, but a beginning of 
a systematic reading may be made with the papers by Fisher (1921, 1930), Neyman (1937), 
and the book by Jeffreys (1939). 

* For the central limit theorem see Cramér (1937), and for an extension to the case when 
the variables are dependent, Bernstein (1927). 


Bayes, Т. (1763), “ An essay towards solving a problem in the doctrine of chances,” Phil. 
Trans., 53, 370. 

Bernstein, S. (1927), “ Sur l'extension du théoréme limite du calcul des probabilités aux 
sommes de quantités dépendantes," Math. Ann., 97, 1. 

Borel, E. (editor) (1925 and later years), T'raité du Calcul des Probabilités et de ses A; pplications, 
Gauthier-Villars, Paris. 

Cramér, H. (1937), Random Variables and Probability Distributions, Cambridge University 
Press. 

Dörge, K. (1934), “ Eine Axiomatisierung der von Misesschen Wahrscheinlichkeitstheorie,” 
Jber. dtsch. Mat. Ver., 43, 39. 


184 PROBABILITY AND LIKELIHOOD 


Fisher, R. A. (1921), “On the mathematical foundations of theoretical statistics,” Phil. 
Trans., A, 222, 309. 
(1930), “ Inverse Probability,” Proc. Camb. Phil. Soc., 26, 528 
Jeffreys, H. (1939), The Theory of Probability, Oxford University Press, 
Johnson, W. E. (1921-24), Logic (3 volumes), Cambridge University Press, 
Kendall, M. G. (1940), * On the method of maximum likelihood,” J, Roy. Statist. Soc., 103 
388 


, 


(1941), “А theory of randomness,” Biometrika, 32, 1. 

Keynes, J. M. (1921), A Treatise on Probability, Macmillan. 

Kolmogoroff, A. (1933), Grundbegriffe der Wahrscheinlichkeitsrechnung, Berlin. 

Lévy, P. (1925), Calcul des Probabilités, Gauthier-Villars, Paris. 

Liapounoff, A. (1901), “ Nouvelle forme du théoréme sur la limite de probabilité,” Mém. 
Acad. Sci. St. Pétersbourg, 12, No. 5. 


translation, 1939, as Probability, Statistics and Truth, W. Hodge. 

Neyman, J. (1937), “ Outline of a theory of statistical estimation b. 
of probability,” Phil. Trans., A, 236, 333. 

Ramsey, Е. Р. (1931), The Foundations of Mathematics, Kegan Paul. 

Uspensky, J. V. (1937), Introduction to Mathematical Probability, McGraw-Hill, New York 
and London. 


Venn, J. A. (1888), The Logic of Chance, Macmillan, 


EXERCISES 


tno ОК; and if (ab . . + f) is the number of ob 


jects possessin AB as. 
show that the number of objects posse ki a 


ssing at least one ob we T K is 
24а) — X(ab) + (абс)... C DA G . . . k). 


In each of a packet of cigarettes there is one of a se 
If a number N of packets is bought at random th 


OY EE ema,” ila)” 


7.2. Three points are taken at random 
they lie on the same semi-circle is 2, 
are equiprobable.) 

Explain the fallacy in the following argument: О 


lie on a semicircle terminating at one of them. The prob 
on this semicircle is 2, which is therefore the required 


Оп а circle. Show that th 


e probabili 
(Assume that in the Pte ity that 


limit elementary intervals of are 


ne pair of the points must 


ability that the third Point lies 
answer, 


7.3. А simple random sample of n values, x, . 


population ` Tm 15 drawn from the norma] 


ow -u(n 
beter ret ( E ) dz, 


p 


EXERCISES © 185 
Show that the value of m which maximises the likelihood of this event is 
1 
= (a 
m= (x), 
which is therefore the “best” estimate of the mean of the population. 
7.4. Show that if p is the probability of a zero in the Irregular Kollektiv the prob- 


ability w,, that there will be » consecutive zeros in a set of n members chosen at random obeys 
the recurrence relation 
Uns = Uy + (1 — 4)" (1 — p) 

and hence that 

Un = I= Un, r sis pus 

[ут ; 

= 1 ВТ 1 

where = 2 (=) (, Si x ira — р) у. 


j-0 


7.5. From a heap of counters of unknown number N a player takes a handful of 
n at random. Examine this argument: it is an even chance whether N is odd or even. 
If it is odd, the probability that n is odd is greater than 3, whereas if it is even the probability 
that » is odd is 4. Thus the probability that » is odd is greater than 1, and the player 
should bet on getting an odd number. 


7.6. An event happens at random on an average once in time t. Regarding occur- 
rences in equal small intervals as equiprobable, show that the probability that it does not 
happen in a specified interval 7 is exp (— T/t). 


_ 7.7. Ifa is rectangularly distributed in the interval 0 <a <6 and b rectangularly 
in the interval 0 <b <9 show that the probability that x? — az + b = 0 has two real 
roots is 4. 


7.8. An unbiassed coin is tossed n times and it is known that exactly m gave heads. 
Show that the probability that the number of heads exceeded the number of tails throughout 
the tossing is (n — 2m)/n. 


CHAPTER 8 


RANDOM SAMPLING 
The Sampling Problem 


8.1. In the previous chapter we have referred incidentally to the sampling problem, 
which can be stated quite simply : given a sample from a population, to determine from it 
the properties of that population. We noted that only in exceptional cases is it possible to 
make assertions about the population with complete certainty, and that consequently it is 


necessary to fall back on statements of a less categorical kind expressible in terms of prob- 
ability. 


of probability to this problem it is necessary 
practice we often meet with samples which 
are not random, having been chosen purposively for some reason or other. In such circum- 


kind. No numerical estimate of the probabilities can be made 
random sampling becomes of primary importance in statistic 
to population. From this point onwards we shall deal only 


avoid constant repetition shall leave it to be understood 
a “sampling distribution ” 


al investigations from sample 
with random samples, and to 


that where a “sample " or 
is referred to, random conditions are assumed. 


8.3. It is useful to begin a discussion of random sampling 


by considering the types of 
parent population from which samples can be chosen. : 


members after withdrawal. The 
cases are sometimes distinguished 
replacement ”, 

Furthermore, we may also in many cases regard the sampling as simple to an adequate 
approximation even when there is no replacement. If the population is large compared with 
the size of the sample, the ai i i 


aterially affect the 


The two 
as “ sampling without replacement ” and “ sampling with 


€ supply. We 
rather a different sense, 
s the probability of 
atter case presents 
necessarily regard 


namely, that of a limiting form. We may, 
a sample from the positive integers or the re 


RANDOMNESS IN SAMPLING 187 


Thus, if we replace an obs2rvational distribution by a conceptual continuous mathe- 
matical distribution, we replace at the same time a finite population by an infinite population. 
The drawing of random samples from such a population is attended by the circumstances 
referred to in 7.13 and 7.29, namely, that the process to the limit must be taken into account. 

(c) Thirdly, the population may be purely hypothetical. Consider, for example, the 
throws of a die. We may picture the continual throwing as a sampling process drawing 
existent members from some non-existent population. In such cases what we are really 
doing is constructing by mental fiction an imaginary population round the sample. 

The concept of the hypothetical population is necessitated by ideas of frequency in 
probability. It is not required (and indeed has been explicitly rejected by Jeffreys) in the 
approach which takes probability as an undefinable measurement of attitudes of doubt. 
But if we take probability as a relative frequency, then to speak of the probability of a sample 
such as that given by throwing a die or growing wheat on a plot of soil, we must consider the 
sample against the background of a population. There are obvious logical difficulties in 
regarding such a sample as a selection—it is a selection without a choice—and still greater 
difficulties about supposing the selection to be random ; for to do so we must try to imagine 
that all the other members of the population, themselves imaginary, had an equal probability 
of assuming the mantle of reality, and that in some way the actual event was chosen to do so. 
This is, to me at all events, a most baffling conception. At the same time, it has to be 
admitted that certain events such as dice-throwing do happen as if the constituents were 
chosen at random from an existent population, and it accordingly seems that the concept of 
the hypothetical population can be justified empirically. 


Randomness in Sampling 

8.4. In its colloquial use the word “random” is applied to any method of choice 
which lacks aim or purpose. We speak of drawing names at random out of a hat, choosing 
plants at random from a field of corn, selecting family budgets at random from the popula- 
tion, meaning thereby that the selection is completely haphazard. 

Now it is found in practice that choice by a human being is not random in the stricter 
sense that it produces equally frequently events which we are entitled to expect to have 
equal prior probabilities. Some examples will make this clear. 


Exåmple 8.1 


In the course of certain work at the Rothamsted Experimental Station sets of eight 
wheat plants were chosen for measurement. Six of these were chosen by approved methods, 


TABLE 8.1 


Distribution of Plants chosen haphazardly in Ranks 1 to 8. 
(F. Yates, Ann. Eugen. Lond., 6, 202.) 


| Numbers bearing Specified Rank. 


| Date. Observation. | | | 
| | 


| | TOTAL. 
1 3-7 58 4 а 6 Ü 8 
| | | | | | | Е 
May 318 | Shoot height . |] Buga m ИШ д | ee Y n? 
June 28th | Ear height o | 9 | 19 | 27 | 28 Ea | s | i x- 
| | | | | | 


188 RANDOM SAMPLING 


referred to below, and may be taken to be truly random. The other two were chosen 
haphazardly by eye. If, in any set, the eight plants were ranged in order of magnitude, the 
two selected by eye could have any number from one to eight ; and if they, in common with 
_ the other Six, were chosen at random, they should occupy these places with approximately 
equal frequency in a large number of sets. Table 8.1 shows what actually occurred on two 


different occasions (a) on May 31st, before the ears of wheat had formed, and (b) on June 28th, 
after the ears had formed. 


The divergence of actual from expected results is quite strikin 


g. On May 315%, before 
the ears had formed, the observer was strongly bi 


ased towards the taller shoots ; whereas in 
rds the central plants and avoided short and tall plants. 
Ppear even in a trained observer, and that the bias need 
or under-estimation in different circumstances. 


Example 8.2 


The following table shows the frequencies of final digits in a number of measurements 
made by four different observers :— à 


TABLE 8.2 


Bias in Scale Reading. Distribution of Final Digits in Measurements by 
Four Observers. 


(G. U. Yule, J.R. Statist. Soc., 90, 570.) 


Final Digit. Frequency of Final Digit per 1000, 
A B С D 
0 158 122 251 358 
1 97 98 37 49 
2 125 98 80 90 
3 73 90 72 63 
4 76 100 55 37 
5 71 112 222 211 
6 90 98 71 62 
7 56 99 75 70 
8 126 101 72 44 
9 129 81 65 16 
Toran 1001 999 1000 1000 


showed some preference for 0, Observer C w. 
to the whole or half unit. Observer D was o 
his measurements being rounded off to th 

The observations were all made by rea 


3 1 urement in two 
bviously very bad indeed, near] 

> 57 
e whole or half unit. aos 


ding a scale, those under A being on drawings to 


< 


RANDOMNESS IN SAMPLING 189 


the nearest tenth of a millimetre, those under B, C, and D being measurements on the heads 
of living subjects to the nearest millimetre. We may conclude from this that different 
observers may exhibit different degrees of bias even under comparable circumstances, and 
that even those who are aware of the existence of the possibility of bias and the necessity for 
taking great care (as observer A was) may nevertheless fail to avoid it. 


Example 8.3 

An observer was placed before a machine consisting of a circular dise divided into ten 
equal sections in which were inscribed the digits 0 to 9. Тһе disc rotated at high speed and 
every now and then a flash occurred from a nearby electric lamp of such short duration that 
the dise appeared at rest. The observer had to watch the disc and write down the number 
occurring in the division indicated by a fixed pointer. 

This was a machine designed for the provision of truly random numbers (see below, 8.10) 
and had been found by another observer to do so. But this particular observer produced 
a definite bias. The frequencies of digits in 10,000 run off by him are shown in Table 8.3. 


TABLE 8.3 


Distribution of Digits obtained by an Observer in using a Randomising Machine. 
(Kendall and Babington Smith, Supp. J.R. Statist. Soc., 6, 51.) 


Digit. 8 9 | TOTAL. 


| | | | 
оа [ааа је [еј 
| | | | | 


ES 
| 
| | 
mar] o kent 
865 | 1053 | 884 | 1057 | 1007 | 1081 | 997 | 1025 | 948 
| | | i 


10,000 | 


Frequency . . | 1083 


If the observer was unbiased the digits should appear in approximately equal numbers ; 
but there is a bias in favour of all the even numbers and against the odd numbers 1, 3 and 9. 
The cause of this bias is obscure, for the observer did not have to estimate (as in the previous 
example) but merely to write down something which he saw, or thought he saw. The 
explanation seemed to be that he had a strong number-preference, i.e. that he actually mis- 
saw the numbers, or that his brain controlled his ocular impressions and censored them. We 
have here to deal with one of the deadliest forms of bias in psychology. 


Example 8.4 

Every year a number of crop reporters in England and Wales estimate the prospective 
Yields of certain crops, forecasts being obtained at different periods of the year and final 
estimates when the crop is harvested. Table 8.4 shows the average estimated yield of 
potatoes at the various times for the years 1929-1936. 

This table exhibits very clearly an effect which has shown itself in nearly all the English 
crop reports (and appears also in other countries), namely, the chronic pessimism of crop 
forecasts. In every case but one in the table the forecasts are below the final estimate. 
Nor do crop reporters seem able to learn by experience that they are underestimating. 
Nothing in this table indicates that the differences between forecast and final estimate 
diminished during the period concerned. j 
Tt should also be noticed that these estimates are the weighted average of a large number 


of independent observations. One of the commoner misunderstandings in this type of work 


| 190 RANDOM SAMPLING 


TABLE 8.4 


Bias in Crop Forecasting. Forecasts of Yields of Potatoes in England 
and Wales (Tons: per Acre). 


(From the official agricultural Statistics.) 


| 


Sept. Ist. Oct. Ist. . Nov. Ist. 
Year. | | Final 
З % Difference | „ү. | 96 Difference y; % Difference | Estimate. 
Yield. from Final. | хааа: | from Final. Yield. from Final. 
|. 
1929 5-7 = 174 | 62 | — 101 6-5 жо 6-9 
1980 6-0 c йй! 6-1 == 6:2 6-1 — 6:2 6:5 
1931 5:5 0-0 5-8 = 3:6 53 — 3:6 5:5 
1932 6-4 = 30 | 62 — 61 6-3 — 45 6-6 
1933 6-4 = 25 6-2 жы 6-4 aR 67 
1934 6-0 — 15:5 6:3 SLS = 6-7 = 5:6 71 
1935 5-6 = 97 | 571 = Кий 6-0 == уз 6-2 
1936 6-0 = Bm) | 5-9 — 48 58 — 6:5 6.2 
: ——— 


is based on the supposition that, though individuals may make mistake. 
cancel out in the aggregate. Our present example shows this t 
There can appear a systematic bias affecting all the individuals 


5, their errors will 
O be untrue in general, 
performing estimates, 

8.5. The foregoing examples are enough to indi 
Trained observers may be biased even when co 
observers may be biased in different ways in si 
may be biased in different ways in different ci 
must look for true randomness elsew 
observers. There may be persons 


cate that human bias is very prevalent. 
nscious of their own imperfections ; different 
milar circumstances ; and the same observer 
rcumstances. Itis abundantly clear that we 
n the part of human 
finely balanced that 


( ! &ve experimented in 
ёз interesting field would regard themselves as among them. 


8.6. In Chapter 7 we saw that the primary function of ran 
that it ensured that certain primitive events were equally probable, We may say that 
a method of selection is random for a population U if, when applied to U, it gives all members 
an equal probability of being chosen ; or, in the language of frequency, if, when continually 


ually frequent] y 


domness in probability was 


applied to U, it educes the members approximately eq 

But this is not enough. Suppose we had a populati 
sampled with replacement. Then a method which c 
the series АВАВ . . . educes each member approxir 
not what we customarily mean by a random method. V 
is that in such cireumstances it should produce a series li 


8.7. A further point is to be noted. We may, in drawing the sam 
one particular variate exhibited by the members, an 
a satisfactory random sample so far as this variate 


idi i ple, be interested in 
d it is possible that & method may give 


їз concerned without doing so for other 


ke 


— a, —— 


THE TECHNIQUE OF RANDOM SAMPLING 191 


variates. Suppose, for example, we are anxious to take a random sample from the 
inhabitants of a particular street. If we are concerned with a variate such as eye-colour it 
might be sufficient to choose a house every so often, say every tenth house, and take the 
inhabitants of that house as part of the sample. Such a method would not give every 
inhabitant an equal chance of being chosen; but if we look back to the time when the 
inhabitants took up residence we may imagine that the colour of their eyes did not influence 
their geographical distribution, and thus that if we consider the allocation of the inhabitants 
in some way independent of eye-colour, and then take every tenth house, we may suppose 
that so far as eye-colour is concerned the sample is random. But the matter would stand 
differently if we were sampling for income. If for instance every tenth house was a corner 
house and thus inhabited by a person of more than average income, our sample would no 
longer be random with respect to income. Looking back, as before, to the time when 
inhabitants took up residence, we see that they can no longer be regarded as distributed at 
random, for those with larger incomes will tend to be attracted towards the more expensive 
houses. 

Thus a method which is random for one population may not be so for another ; and even 
in the same population a method random for one variate may not be so for another. 
Randomness is relative. 


The "Technique of Random. Sampling 


8.8. Suppose, then, that we are given a population and a variate is specified. How are 
We to draw a random sample, i.e. how can we find a method which is random for that popula- 
tion and that variate? The answer lies partly in theory and partly in practice. 

(a) In the first place we must require that there is no obvious connection between the 
method of selection and the properties under consideration. The method and the properties 
must be independent so far as our prior knowledge is concerned. In sampling a field of 
wheat for shoot height, for example, we must not use a method which could be influenced by 
that height, such as skimming a hoop over the field and selecting the plants round which it 
fell (for the hoop might tend to catch on the taller plants). Again, in sampling the 
inhabitants of a town by choosing names from a telephone directory we should undoubtedly 
tend to get the more well-to-do classes and hence, if the variate under consideration is wealth 
or any related characteristic such as number of children, political opinion, standard 
education and so on, the sample would not be random. If we were concerned with character- 
istics such as height, hair colour, or blood group the sample might be random, though it is 
not difficult in many similar cases to think of reasons why the variate might be linked with 
wealth. 

If this matter is viewed from the standpoint of the axiomatic theory of probability the 
absence of knowledge about relationship between the method and the characteristic under 
consideration may be sufficient to ensure randomness, for the probabilities of elementary 
propositions then become equal—the probabilities being measures of prior attitudes of mind.* 
But if the frequency viewpoint is adopted it is not enough that there should be absence of 
knowledge of this kind, for unknown to the observer there may be relations which will prevent 
the elementary propositions from being true in approximately equal proportions. The 
Presumption is that if we make as great an effort as possible to ascertain whether any relation- 
ship exists and fail to find it, there is no relationship ; and hence we can assume randomness 
i At least, this is my interpretation of the position ; but the writers on the axiomatic theory have 
not discussed randomness at any length, being content to define it in terms of probability, and I may 
be putting a gloss on their views which they would not accept. 


192 RANDOM SAMPLING 


with more or less confidence. But in this approach the assumption of randomness ig 
ultimately part of the general uncertainty of the inference from sample to population. 

(b) Secondly, we may rely on previous experience of a random method to justify its use 
on new occasions. This is evidently an extrapolation, and though most people would regard 
it as reasonable, the fact has to be realised. The axiomatic theory of probability can embrace 
this extrapolation within its scope, for the probabilities given by the method are assessable 
in terms of prior knowledge ; but the frequency theory has to take the extrapolation as an 
additional assumption. 

8.9. Oneofthe most reliable methods of drawing random samples consists of construct- 
ing a model of the population and sampling from the model. We may, 
down the characteristics of each member on a card and sample by choosing cards from the 
pack corresponding to the whole population. This is the method adopted in lotteries and 
the process is known as lottery or ticket sampling. It is moderately effective but suffers in 
practice from two disadvantages: the labour of constructing the card population, and the 
danger of bias in the drawing of cards. Example 12.1 below, for instance, shows that the 
ordinary processes of shuffling and dealing playing-cards may fail to be satisfactory. To be 
reasonably satisfied about the randomness of the shuffling entails a good deal of trouble and 
labour, and the same object can be attained much more simply by the use of random sampling 
numbers, which we now consider. 


for instance, note 


Random Sampling Numbers 


8.10. The easiest way of constructing a miniature population is to attach an ordinal 
number to each member, mostly simply by numbering the members from 1 onwards. The 
set of ordinals so obtained is the miniature population and the 


problem of drawing a random 
sample reduces to finding a series of random numbers, The advantages of this method are 


obvious : no physical mode! population has to be constructed ; the numbering can be carried 
out in any convenient паппег; and the series of random numbers can be applied to any 


enumerable population so that any series of random numbers has a very wide range of 
application: 


dom with respect to those characteristics. The 


als to the population, 
actice a procedure of this 
of numbering the population 
n any convenient way, related 


to the characteristics or not, and then seek for a set of numbers which ате a random s 


Я : Я et from 
the possible ordinals of the population. 

8.11. One of the more obvious ways of drawing rando 
population is to use haphazard numbers taken from some tot; 
for instance, we wished to take a sample from the visible stars i 
small complications due to the existence of double stars an 


3 
1 enumerated 1 


Suppose, 1 


m samples from a; 


at is then 
t it seems 
àce-names 
etween the 


І 
| 


RANDOM SAMPLING NUMBERS - 193 


distribution of stars in the sky and the distribution of places on the Earth’s surface. A little 
reflection, however, will show that the method is unsound. There are large stretches of 
territory and sea on the Earth which have no place-names on them—the poles, deserts and 
oceans ; consequently no numbers will occur for these regions and there will be corresponding 
areas on the celestial sphere which have no chance of being included. 

8.12. As a next attempt we might take a book containing a number of digits, e.g. 
a telephone directory, or a set of statistical tables or mathematical tables, open it at hazard 
and choose the digits which first strike the eye, or which occur at the top of the page, and so 
on. This is an improvement, but it is still open to some objection. 

(а) Telephone directories. Table 1.4 on page 6 shows the distribution of 10,000 digits 
taken from the London telephone directory. Pages were chosen by opening the directory 
haphazardly, numbers of less than four digits and numbers in heavy type were ignored ; and 
of the four-figure numbers remaining the two right-hand ones were taken for all numbers on 
the page. If the numbers were random we should expect about 1000 of each digit in the total 
of 10,000. Actually there are very considerable deviations from this expectation, and we shall 
see in a later chapter that they cannot be explained as sampling fluctuations. There are 
significant deficiencies in 5’s and 9's, due to several causes such as the tendency to avoid 
these digits because they sound alike, the reservation of numbers ending in 99 for testing 
purposes by telephone engineers and so on. It is evident that tables of random numbers 
could not be constructed from directories such as this. 

(6) Mathematical tables. Evidently care has to be exercised in using mathematical 
tables in constructing random series. Suppose, for instance, we take a set of logarithm 
tables. There are clearly relationships between successive logarithms, expressible by the 
fact that differences are approximately constant if the interval is small. Moreover there is 
а very curious theorem about digits in certain classes of table which throws theoretical doubt 
on the method. Consider the logarithms to base 10 of the natural numbers from 1 onwards. 
Suppose we choose the kth digit in each and so obtain a series of numbers 0-9. Then the 
proportional frequency of any digit in this series does not tend to a limit as the length of the 
series increases, whatever k may be.* Just what does happen does not appear to be known, 
but it would seem that certain systematic effects begin to show themselves and these will 
obviously endanger the randomness of the series. 

(c) Statistical tables. If we have a volume of statistics such as populations of towns 
and rural districts there are some grounds for supposing that if the numbers are large—say 
four figures or more—the final digits will be random. Here again, however, the use of such 
tables requires care—they may have been compiled by an observer with number preferences, 
and some rounding up may have taken place. 


8.13. However, the necessity for the ordinary student to construct random series of 
his own has been obviated by the publication of various tables of Random Sampling Numbers. 
There are three such available :— 

(а) Tippett's numbers comprise 41,600 digits taken from census reports combined into 
fours to make 10,400 four-figure numbers (T'racts for Computers, No. 15). 

(b) Kendall and Babington Smith's numbers comprise 100,000 digits grouped in twos 
апа fours and in 100 separate thousands (Tracts for Computers, No. 24). These numbers were 


ORI. Franel, Vierteljahrschrift der Naturforschenden Gesellschaft in Zürich (1917), 62, 286. So 
teat а mathematician as Poincaré mado a mistake on this point. 
4.8.—VOL. І, о 


194 j RANDOM SAMPLING 


obtained from a machine specially constructed for the purpose on the lines very briefly 
described in Example 8.3. 

(c) Fisher and Yates’ numbers comprise 15,000 digits arranged in twos (Statistical Tables 
Jor Biological, Agricultural and Medical Research). These numbers were obtained from the 


15th-19th digits in A. J. Thompson’s tables of logarithms and were subsequently adjusted, 
it having been found that there were too many sixes. 


Before considering the basis of these tables it may be helpful to give some examples of 
their use. Here are the first 200 of the Kendall-Babington Smith tables :— 
TABLE 8.5 
Random Sampling Numbers. 
(Tracts for Computers, No. 24.) 
2315 7548 5901 8372 5993 7624 9708 8695 2303 67 44 


05 54 5550 4310 53 74 3508 9061 1837 4410 96 22 13 43 
14 87 1603 5032 4043 6223 5005 1003 2211 5438 08 34 
38 97 67 49 5194 05 17 58 53 78 80 5901 9432 42 87 16 95 
9731 2617 18 99 7553 08 70 9425 1258 4154 88 21 0513 


Example 8.5 


To draw a sample of 10 men from the population of 8585 men of Table 1.7. 

The first process is to number the population ; and here, as in most similar cases, one 
numbering has already been provided by the frequency-distribution. We take numbers 
1 and 2 to be those in the group 57— inches, numbers 3 to 6 those in the group 58-, and so on, 
those in the group 77— inches being numbers 8584 and 8585. 

Now we take 10 four-figure numbers from the tables, e.g. reading across in Table 8.5 
we have 

2315, 7548, 5901, 8372, 5993, 7624, [9708], [8695], 2303, 6744, 0554, 5550. 

The two numbers in square brackets are greater than 8585 and we ignore them. We 
noy select the individuals corresponding to the remaining 10 numbers. They will be found 
to be in the intervals 65-, 70-, 68-, 72-, 68—, 70-, 65—, 69-, 63-, 68- inches respectively. 


The mean of these values considered as located at the centres of intervals is 68-24, as 
against a value in the population of 67-46. 


Example 8.6 


To draw a sample of 12 from the population in the followin: 


2 т g bivariate table, showing 
the relation between inoculation and attack in cholera. 


Not Attacked. Attacked. Toran, 

Inoculated , . . . 276 3 279 
(0001-3312) (3313-3348) 

Not inoculated 66 539 
(3349-9024) (9025-9816) 

TOTALS v v £s 2 749 69 818 


| 
| 


td 


RANDOM SAMPLING NUMBERS 195 


There are now $18 members. We could, of course, take three-figure numbers from 


the tables, obtaining, e.g. from Table 8.5 
231, 575, 485, etc. 

But this is rather troublesome as the numbers are not grouped in threes. It is more con- 

venient to take four-figure numbers as before and to associate each member of the population 

with 12 numbers in the tables, e.g. the first would correspond to 0000-0011, the second 

to 0012-0023, and so on. We then get the numbers shown in brackets in the above 


table. Numbers above 9816 we ignore as before. 
The two numbers omitted in the previous тешр can now be used, and we find the 


following results :— 


Not Attacked. Attacked. TOTAL. 
Inoculated . . . . | 3 0 3 
| 
Not inoculated . . | 8 1 9 
Tomis. . . . | n 1 12 


js 


Here, for example, the member corresponding to the number 2315 falls in the not-attacked : 
inoeulated class, and so on. 

It has so happened in this example that no member in the very small class inoculated : 
attacked class has been selected. Suppose we had had a series containing 

3314, 3323, 3333, 3341. 

All dress fall into the group and there are four of them, as against only three members 
in the population. Had we been confronted with this position we should have had to 
decide whether the sampling was to be with or without replacement. If it was without 
replacement, we should have to suppose that the first three numbers in the group 3313-3348 
exhausted that part of the population and ignore all numbers of the group occurring sub- 
sequently, 


Example 8.7 

To construct a series of random permutations of the numbers 1 to 5. 

Here we are not concerned with the digits 0, 6, 7, 8 and 9 and so ignore them in the 
table of random numbers. We read through the table and note the digits as they occur, 
€.g. in Table 8.5 we have 2315, 7548, etc. The 7 is to be ignored and also the second 5, 
for one 5 has already occurred. We then reach the permutation 23154. Then we start 
again, the next series being 8, 5901, 8372, 5993, 7624, etc., giving the permutation 51324; 
and so on. 

Example 8.8 


? dx. 


To take a random sample of 10 from the normal population dF = DL 
This is a particularly interesting case, for we have to select a sample from an infinite 
Population. Such a process, as has been seen, can only be considered as a limiting one. 


i 


| 
196 RANDOM SAMPLING 


Consider the frequencies of the normal curve in ranges 0-1 on each side of the mean. 
These may be obtained very simply from tables of the normal integral by differencing Е 
and in fact are given іп many tables of that integral, e.g. that of Appendix Table 2. Suppose \ 
the frequencies rounded up to four places of decimals, e.g. those near the mean would be 


0-0- 0-0398 
0-1- 0-0394 
0-2- 0:0387 


0:3- 0-0375, ete. 


and the total frequencies are given by the normal integral itself, e.g, 


Upper Limit of Frequency up to Upper Limit of Frequency up to 
Interval. that Limit. Interval. that Limit. 

0-0 0-5000 — 0-1 0-4602 

0-1 0-5398 — 0:2 0-4207 

0-2 0-5793 — 0:3 0-3821 

0-3 0-6179, etc. — 0-4 0:3085, ete. 
We may now attach a four-figure random number to this population, which is finite and | 
discontinuous : e.g. the number 5461 corresponds to a variate-value + 0-1— and the number PE 


3500 to — 0-4-. 
Had we taken the table to n places of decimals we should 
numbers. Furthermore, we can make the approximation more 


variate interval. Such matters as this are to be decided in the light 
mation required. 


have required n- figure 
exact by taking a finer 
of the degree of approxi- 


8.14. Random Sampling Numbers must obey certain conditions before they can be 
used. Апу set of numbers whatever is random in the sense that it might arise, with however 
great improbability, from random sampling; but such a set might not be suitable as a 
table of Random Sampling Numbers. From the examples already given it is clear that | 
we desire such a table to have very great flexibility. It should give random results in as | 
many cases as possible, whether used in part or in whole 

Now it is impossible to construct a table of Random 


Sampling Numbers which will | 
satisfy this requirement entirely. Suppose, to take an extreme case, we constructed a 1 


table of 10?" digits. The chance of any digit being a zero is vo and thus the chance that | 
any given block of a million digits are all zeros is 10-19, Such a set should therefore arise 

fairly often in the set of 10(1°°—5) blocks of a million. If it did not, the whole set would © 

not be satisfactory for certain sampling experiments. 


е Clearly, however, the set of a million — / 
zeros is not suitable for drawing samples in an experiment г 


equiring less than a million 
digits. 


Thus, it is to be expected that in a table of Rando 


^ m Sampling Numbers there will 
occur patches which are not suitable for use by themselves. "The unusual must be given 
a chance of occurring in its due proportion, however small. Kendall and Babington Smith 


attempted to deal with this problem by indicating the portions of their table (5 thousands 
out of 100) which it would be better to avoid in sampling experiments requiring fewer | 
than 1000 digits. 


8.15. If a table of random numbers is used 
of ten, we expect the members to appear in appro 
words we expect such a table to contain the ten 


to draw members from a population 
ximately equal proportions. In other , 
digits 0-9 in approximately equal pro- 


+h 


SAMPLING FROM ATTRIBUTES 197 


portions. Similarly we expect the hundred pairs 00-99 to appear in approximately equal 
proportions, and so on. Various tests of this kind, based on a comparison between actual 
frequencies and those required to satisfy the laws of probability, can be devised. No 
table can satisfy them all, but if it satisfies tests which (а) ensure the randomness of the 
numbers for the commoner types of sampling inquiry for which it is likely to be used and 
(b) are capable of revealing any particular sort of bias to which the numbers are susceptible 
in virtue of their mode of formation, it is likely to be of general application. 

For a more detailed discussion of these matters and the results of tests on the Tippett 
tables, the Kendall-Babington Smith tables and the Fisher-Yates tables, reference may be 
made to the works listed at the end of the chapter. 


Sampling from a Contimuous Population 

8.16. Random Sampling Numbers offer the best method known at the present time 
of drawing random samples from an enumerable universe, and as was seen in Example 8.6, 
may also be used to draw samples from a continuous population specified mathematically. 
But cases sometimes occur in which they cannot be employed. For instance, if we wish 
to take a sample of milk or flour, we cannot in practice number each particle and extract 
it from the population for examination. In such cases we are usually compelled to fall 
back on more intuitively grounded procedure. To take a random sample from a milk 
churn, for example, we might stir the contents thoroughly and scoop up a sample hap- 
hazardly. Sometimes, when the population is of manageable size, we can proceed system- 
atically by dividing it into a number of parcels and selecting parcels by the ordinary 
technique of random numbers. Most sciences have their own peculiar sampling problems 
and no attempt can be made here to discuss them all. At this point we leave the technique 
of random sampling and assume hereafter, unless the contrary is stated, that the material 
we are discussing has been obtained by a random process. 


Sampling from. Attributes 

8.17. As an introduction to the general sampling problems we shall consider the 
sampling of attributes, which raises all the difficulties of principle but is not obscured by 
too much mathematies. 

Suppose we have a random sample from a population w hose members all exhibit either 
an attribute A or its negative not-4. Our sample is x in number, and a proportion p, or 
a number pn, exhibit the attribute; and consequently a proportion q, or number qn 
(p +¢=1) donot. We will assume that the population is large, or that sampling is with 
replacement, so that the probability of obtaining an A at any drawing is not affected by 
other drawings and is therefore a constant, say w. 

The problems we have to consider are of three types :— 

(2) Suppose we have some reason for supposing that the proportion of A’s in the 
population is given by a known w. Does the observed proportion p bear out this hypothesis 
or is it so divergent from w as to lead us to doubt the hypothesis? In an experiment with 
plants exhibiting two strains of a quality such as height in pea plants, we may wish to 
test whether the breeding follows the simple Mendelian law of dominant and recessive. 
If we begin with two pure strains tall and short, cross-breed a first generation and then 
produce a second generation by interbreeding, the proportional frequencies of “ short” 
and “ tall ” in this generation will be 2 and 1 if “ short ” is dominant and } and 3 if “ ta 
is dominant, provided that the simple Mendelian law holds. Suppose we carry out such 


198 RANDOM SAMPLING 


(c) Then, having estimated 


it, we wish to know th 
How far is the estimate likely 


e degree of reliability of the estimate. 
to deviate from the г 


eal value of о ? 


ains constant, ke arrayed 
btaining pn A's end qn not- 


n 
s) where ; = 1 — ø. 


by the binomial n (z + о)", that 
A's in a sample of » is the term in 
o?" ym in (у + о)", i.e. 
pn or fewer A's is the в 
1f this probability is 
(а) An improbable event has occurred, 
(b) The hypothesis is not true, i.e. the pro 
(c) The sampling process is not random. 


We can usually exclude (c) by taking care with the sam: 
to balance (а) and (b). It is in 


chapter that we dismiss a h 


Thus the probability of obtaining 


in this binomial. 
ree possibilities :— 


um of the first pn + 1 terms 
small we have the choice of th 


portion of A's in the population is not w. 


Example 8.9 


In certain coin-tossing experiments a coin was tossed 2 
15 times. Does this conflict with the hypothesis that th 

The hypothesis we have to test here is that w = 1. 
we should get 5 tails or fewer ia the sum of the first 6 terms in ($ + 3) 
from Table 5.2 to be 0-0207. Thus the probability of such an event is small the odds 
are 50 to 1 against the event—and we suspect the hy i 
other hand, we had supposed the value of co t 
6 terms of (0-3 + 0-7)? to be 0-4163 
should not have rejected the hypothesis. 


0 times and came down heads 
e coin was unbiased ? 


8.19. In the example just given we p 
that the terms of the binomial could be cal 
large—100 or more—and the evaluati 
most tedious. We can, if complete a 
given in 5.7, and evaluate from the in 


STANDARD ERROR 199 


In fact, for many purposes, it is not even necessary to carry out the actual evaluations. 
From the tables of the integral (Appendix Table 2) we note that the probability of a 
deviation as great or greater in absolute value than the standard deviation is 0:3173 ; than 
twice the s.d. is 0-0455 ; than thrice the s.d. is 0-0027 ; than four times the s.d. is 0-00006. 
Thus if we find np differs from то by more than twice V2:5; we begin to doubt the hypo- 
thesis, and if the difference is more than thrice V2; we may confidently reject it. 


шу 


Similarly, if the proportion p differs from w by more than twice ,/ — we begin to 
n 


doubt the hypothesis, and so on. It makes no difference whether we compare the actual 
frequencies or the proportions. 


Example 8.10 

In some dice-throwing experiments Weldon threw dice 49,152 times, and of these 
25,145 yielded a 4, 5, or 6. Is this consonant with the hypothesis that the dice were 
unbiased ? 

If the dice are unbiased the probability of a 4, 5 or 6 is $}. Thus по is 24,576 and the 
observed np is 569 in excess of this value. 


Мпа = V(k x } x 49,152) = 110-9. 


The observed deviation is more than 5 times this quantity and we accordingly suspect 
very strongly that the dice were biased. 


Standard Error 

8.20. The quantity 4/ (тоу) is a particular case, appropriate to the binomial, of an 
important statistical concept known as the standard error. It is the standard deviation of 
the sampling distribution of the statistic np. It is particularly important in the class of 
cases, which is relatively large, wherein that sampling distribution can be taken to be 
normal either exactly or to an adequate degree of approximation. 


8.21. Let us now turn to the case in which no value of w is given a priori. If the, 
sample gives a proportion of A’s equal to p, what shall we take as our estimate of w? 
The most obvious course is to take p itself; and this is the course dictated by the more 
sophisticated ideas described in Chapter 7. 

Consider first of all the method of maximum likelihood. The probability of obtaining 
np A's and nq оз is 


n np.nq 
un cadres 5 (82) 


This is proportional to the likelihood, and neglecting constants we have to maximise 
L = ko?(1 — wa 
Tt ; Ў : : oL д 1 aL 
for variat . We hi L ~ E A => 
$ riations of w. We have, since L is non-negative, that = and J (log L)| = Lia 
vanish together and it is therefore sufficient to maximise log D. We have 


д m "IRI 
250198 Е © 1— : 
giving о = р. ` А . . . . (82) 


200 RANDOM SAMPLING 


The method of Bayes will give the same result if we suppose the possible values of 
w equally distributed between 0 and 1 for dz — 0. For then 
P(da'p) < P(da)P(alp) 8 5 5 . . (8.3) 
ос "Рула г . (8.4) 
which, as before, is maximised when © = р. 
There is another way of looking at this problem of estimation. Suppose we took a 
large number of samples from the population with a proportion of w A’s and 4 not-A's. 


Our estimate of w would be р in each case, p varying from sample to sample; and the mean 
value of all such estimates would be 


Ep = Ma (Ayer al chu c eec цал) 
учн) 


(оза) 


so that the mean value of our estimate over all possible samples is w. Such an estimate 
is called wnbiased—if we follow the rule of estimation the average of our estimates in a 
large number of cases will be exactly the correct value w. It may thus be argued that the 
unbiased estimate should be taken as a reliable estimate of w. 


8.22. In this case, therefore, all the approaches lead to the same conclusion (a 
happy state of affairs which, as we shall see in the sequel, does not always exist). Consider 
now the next stage of the problem: what is the reliability of the estimate? In other 
words, how far is the estimate likely to differ from the true value ? 


We know that if the sample value p differs from w byt Al z, the probability of the 
difference becomes smaller as ¢ increases. 
we can say that it is improbable that p will d 
But to specify this amount exactly we requir 
we are trying to find. 

The problem can only be solved as an approximation. 
error of p is of the order 27%, so that we may put 
k 


Vies (Nui. 


hi Tiis 1 EM cs В 
ы " n i ti Pw mi 
= D Ud CER kar DN 
"IE 
neglecting terms of order a-!, 


= E 20 2 


Thus, with an assigned degree of probability 
iffer from w by more than an assigned amount. 
е to know w; and this is precisely the quantity 


If n is large the standard 


2 E 


STANDARD ERROR 201 


Thus for large т the standard error of w is approximately equal to |" and we thus 


reach the fundamental result that in large samples of attributes the standard error may be 
caleulated by using the estimates of the parameters under estimate instead of the (un- 
known) values of those parameters themselves. 


Example 8.11 

In a sample of 600, 240 are found to possess the attribute A. Thusp = 0-40, np = 240, 
V(npq) = 12. We can thus regard it as somewhat improbable that nw differs from 240 
by more than twice this amount, 24, and highly improbable that it differs by more than 36. 
We thus can say with some confidence that nw lies in the range 240 + 24 and with great 
confidence that it lies in the range 240 + 36. 


8.23. We now turn to a general consideration of the problems of sampling which 
have been exemplified above. In the first place, let us note the role of the sampling dis- 
tribution in this branch of the subject. We construct from the observations some statistic t. 


. The sampling distribution of this statistic will in general (but not always) depend on some 


parameters of the parent population. The probability of the observed і then permits the 
making of statements, by inverse probability, likelihood or otherwise, about these parameters, 
and thus we are enabled to draw inferences about the parent population. The sampling 
distribution is thus fundamental to the whole subject and several subsequent chapters 
will be devoted entirely to the methods of finding distributions when the parent is specified. 

Tf we wish to test some hypothesis about the parent which is expressible by the deter- 
mination of certain parameters a priori, the problem is fairly simple. Given the values 
of the parameters, we can determine from the sampling distribution the probability of the 
observed value of the statistic, and use this to assess the acceptability of the hypothesis. 
Complications can arise even here, however, for in general, several statistics can be compiled 
from the same sample, and they need not necessarily all lead to the same conclusion about 
the hypothesis ; for example, a sample might have a mean which throws doubt on the 
hypothesis and a variance which does not. We shall discuss this difficulty more fully in 


the second volume. 


8.24. When the parameters of the population are not given a priori, we have the 
double problem of estimating the parameters from the sample and assigning probable 
limits to the estimates so obtained. We have already touched on some of the principles 
of estimation and shall develop the topic more systematically in due course. When we 
have obtained an estimate—itself a statistic—we seck its sampling distribution and there- 
from can assign probable limits to the population value. A special class of cases arises 
when we can find a statistic whose sampling distribution depends on only one parameter 
of the population (as in the case of attributes). 


8.25. These latter types of problem permit of certain important approximations, 
namely in the case when the sample is large. We saw in Chapter 7 that under very general 
conditions the sum of n independent variables, distributed in whatever form, tends to 
normality as tends to infinity. Now many of the ordinary statistics in current use can 
be expressed as the sum of variates, e.g. all the moments; and many others may also 
be shown to tend to normality for large samples. Thus we may approximate— 

(а) By taking a statistic, calculated from the sample as if it were a population, to be 


202 RANDOM SAMPLING 


the estimate of the corresponding parameter in that population, e.g. the variance of the 
sample may be taken as an estimate of the variance of the population. 
(6) By calculating the mean and variance of the sampling distribution by using, 
instead of the unknown parameter values, the statistic values calculated according to (a). 
(c) By assuming that the distribution is normal and hence determining probabilities 
from the normal integral with the aid of the sampling mean and sampling variance (the 
latter being the square of the standard error). 


8.26. Just how large n must be for such approximations to be valid is not always 
easy to say. For some distributions, particularly that of the mean, quite a satisfactory 
approximation is given by low values of n, say n > 30. For others n has to be much higher 
before the approximation begins to give satisfactory results, e.g. for the product-moment 
correlation coefficient (below, 14.5) even values as high as 500 are not good enough. 


8.27. In the following three chapters we discuss the approximate and accurate 
methods for determining sampling distributions. Chapter 9 deals with large samples 
and is thus devoted mainly to methods for determining standard errors. Chapter 10 deals 
with methods for determining sampling distributions exactly. Chapter 11 discusses 
methods of approximating to sampling distributions by finding their lower moments. 


NOTES AND REFERENCES 
For some interesting discussions of problems of sampling generally, see J'ensen (1926), 
Bowley (1926), Hilton (1924), Kiser (1934), Yates (1935) and Neyman (1934). For a dis- 
cussion of random sampling, see Kendall and Babington Smith (1938 and 1939) and Kendall 
(1941). The various tables of random numbers are referred to in the Introduction. 


Bowley, A. L. (1926), “ Measurement of the Precision attained in Sampling," Bull. Int, 
Stat. Inst., 22, premier livre. 


Hilton, John (1924), “ Enquiry by sample; an experiment and its results,” Jour, Roy. 
Statist. Soc., 87, 544. 

Jensen, A. (1926), * Report on the representative method in statistics,” Bull. Int. Stat 
Inst., 22, premier livre. j 

Kendall, M. G., and Babington Smith, B. (1938), * Randomness and random sampling 
numbers,” Jour. Roy. Statist. Soc., 101, 147, and (1939) “ Second paper on 
random sampling numbers,” Supp. J.R. Statist. Soc., 6, 51. 

Kendall, M. С. (1941), “ A theory of randomness,” Biometrika, 32, 1. 

Kiser, e. (1934), “ Pitfalls in sampling for population study," Jour. Amer. Stat. Assoc 

, 250. L 
Neyman, J. (1934), “ On two different aspects of the re 
, Statist. Soc., 97, 558. 
Yates, F. (1935), * Some examples of biased sampling,” Ann. Eugen. 
Yule, G. Udny (1927), * On reading a scale,” Jour. Roy. Statist. Soc., 


presentative method,” Jour, Roy 


‚ Lond., 6, 209. 
90, 570. 


EXERCISES 


8.1. Of 10,000 babies born in a particular country 5100 are male. 
a random sample of the births in that country, show that it throws с 
on the hypothesis that the sexes are born in equal proportions. 
Consider how far this conclusion would be modified if the sa 9 
births, 510 of which were male. mple consisted of 1000 


Taking this to be 
onsiderable doubt 


XA „- 


EXERCISES 203 


8.2. If the number of members of a population bearing an attribute A is relatively 
small, show that the standard error of the number of A's in the sample is the square root 
of that number. Show also that the number of A's in the sample is an unbiased and 
a maximum likelihood estimate of the parameter of the Poisson distribution expressing 
the distribution of the number of A’s in large samples from the population. 


8.3. By considering the hypergeometric distribution, show that if samples of т are 
drawn from a finite population of V without replacement, and a proportion w of that 
population bear an attribute A, then the standard error of the proportion p in the sample is 


N —n i 
($m): 
Show also that p is an unbiased estimate of v. 


8.4. (Tchebycheff's inequality. Show that for any distribution 


and hence that for any member drawn at random 


l 
P(|x — E()| avus) < "I 
Show further that the variance of the sampling distribution of proportions bearing 
;t 
an attribute 4 in samples of from a population of attributes is not greater than in Hence 


the probability that an observed proportion p differs from the true proportion w by more 


^ than amount b is not greater than m 


(This gives us an exact result, no assumptions about the normality of the limiting 
form of the binomial or the use of estimates in calculating standard errors being involved. 
The limits are, however, much too wide.) 


8.5. If a proportion & has to be estimated from a simple random sample with 
proportion p, and if f is the prior probability of w, then the posterior probability of c is, 
according to Bayes' theorem, proportional to 

for — o)". 
Show that this is a maximum if 
iy 
f до 
Hence, in general, as n increases, the solution tends to w = p, whatever the prior prob- 


ability of w. In other words, the maximum likelihood estimate is an approximation to 
that given by Bayes’ theorem as ” tends to infinity, even if Bayes’ postulate is not assumed, 


prm e. 
+ ei ж =0. 


CHAPTER 9 
STANDARD ERRORS 


9.1. Towards the close of the last chapter we discussed the estimation of statistical 
parameters from large samples and the type of judgment of their reliability which depends 
on the use of the standard error. It was remarked that, for large samples, an estimate 
of a parameter may be obtained by calculating from the sample values the value of the 
parameter in the sub-population composed by the sample; and it was established that 
for samples of n the standard error gives a valid measure of precision, provided that (a) the 
sampling distribution of the statistic under discussion approaches normality and (b) that » is 
large in the sense there defined. It was also pointed out that a sufficiently accurate estimate 
of standard errors involving parent parameters could be obtained by using as the parameter 
values the corresponding statistics from the sample itself. 

Since the majority of statistics in current use do tend to normality the theory of large 
samples is, in the main, devoted to the determination of standard errors. In this chapter 


for the standard errors of the various statistics considered in previous chapters. To avoid 
the usual square roots associated with the standard error we shall write our results as 
sampling variances and covariances. Thus, for a statistic £ we write the variance of its 


statistics 


9.2. By definition, the rth moment of a statistic ¢, that is the rth moment of its 
sampling distribution, is the mean value of t taken over all possible samples, and may be 
written H(t") (cf. 3.35). If the joint distribution of the variates z... x ; from which 
tis calculated, is QE TR £y), then the rth moment of tis the integral of кй (considered 
as a function of the x’s) over the domain of the v's. In particular, if the Sample is simple 
and random and the parent distribution is dF, we have 


E(r) x T | lr та)... dF(z,). 


We are partieularly interested in this chapter in the first and second moments. of ż 
that is, the mean and variance of its sampling distribution, Tt may be recalled that th 
mean value of a sum is the sum of the mean values and that, if the variables are Эе aed 

3 


the mean value of a product is the product of the mean values (3.36 
will be repeatedly required. ^ Thess dero таа 


Standard Errors of Moments 


9.3. In the first place we consider the standard errors of the 
depending on the moments, including the mean, variance, the Pears 
and kurtosis and the moments and cumulants themselves, 

The sampling distributions of moments tend to normalit 
MET ON У under very genera] conditi 
in virtue of the Central Limit Theorem. In f i istribution ; nee ee 
a 5 if the parent distribution is represented 


wide class of statistics 
on measures of skewness 


STANDARD ERRORS OF MOMENTS 205 


i 
by f(x) dx, the distribution of the jth power of a, say y, is easily seen to be ; f т dy 
and the jth moment is the sum of n independent variates, each of which is distributed in 
that form. . 

It is not so obvious that functions of the moments such as b, and b, (the sample values 
of the parameters В, and f.) will tend to normality, and special investigations may have 
to be made for particular statistics. Even at the present time it is often assumed without 
proof that certain statistics tend to normality, the feeling apparently being (so far as any 
feeling uprises into consciousness) that as most statistics do tend to normality the onus 
is on an objector to prove that any particular statistic does not. This is very dangerous 
to accurate inferential reasoning and the point is one to be borne in mind wherever a standard 
error is used. 

On a similar point, it should also be remembered that some statistics tend to normality 
more rapidly than others, and a given » may be large for some purposes but not for others. 
So far as it is possible to generalise with safety, we can usually (but not always) assume 
values of n greater than 500 to be large; values greater than 100 are often great enough 
to be large for our purposes ; values below 100 are suspect in many instances; and values 
below 30 are very rarely large. 

In the following we shall adopt the usual convention in regard to the distinction of 
parameters and statistics by writing Greek letters to represent the former and Roman 
letters to represent the latter. We have, then, for the 7th moment-statistic Mp, COTTE- 
sponding to the rth moment parameter и), 


F 1 n 
m, — 280 a . А . . с (931) 
j= 
and for the mean-moment е 
SS А 
ти, = TEN m). . . . + (9:2) 
9.4. Consider now the mean value of m,. Since the a's are independent we have 
‚ 1 
E(m,) = qo’) = Ear) 
= ш. ` А 5 а . " . (9.3) 


The sampling variance of m, is, by definition, E(m, — 1)? and thus assuming, as we do 
throughout, that the appropriate moments exist, 


var (и) = Ви) — ic 
= m [{2(07)}? — Qn, 0а") + n*u?) 
EL(2(27))7] — рх 


E (2002) + Z(ey237)) — ht 


the second summation extending over the n(n — 1) cases in which j =k (permutations 


206 А STANDARD ERRORS 


of j and k thus being allowed). Since the 2’s are indep 


endent the mean value of the product 
is the product of the mean values, and thus 


7 1 ‚ > ; 
var (т) = zs (nj, + n(n — 1)ш/5} — из 


Aw РА 
= (ио, — n). 5 : = d ; - (9.4) 
This is an exact result, 
In a similar way, if we have two moments, mo т, 


COV (my, mj) = Е{(т, — uj, — u,)) 


(en Xen) 


i 1 13. i "T. 
= pE (Zent) + ni, Cty) — ЕУ) — al^ Ё(®зл) + E(u' ur) 


;, their sampling covariance is given by 


1; Day 
= nair = 006), 3 5 > 5 5 b s . (9.5) 


Which reduces to (9.4) if q = т, as it must 


; for the first product-moment of ‘two identical 
variables is their variance, 


9.5. The formulae for moments about the mean a. 


Te not so simple, for the mean itself 
is subject to sampling fluctuations. We have, in fact, 


Е(т,) = “Ete =e he 
Now putting r — 1 in (9.4) we find 


, T , 
var (mj) = aute — ш?) 


1 
ne? . . (9.7) 

^ lo 
and thus the standard error of the mean is - Consequently, if the dis 
anywhere near normality, nearly all the values of m, will lie within a ran 
of order n-?, То order ni we may then, taking an origin at the 
population, neglect, powers of m; higher than the first, 


tribution is 


ge of the true value 
1 А 
(nj) = Е 000" — килу) 
1 1 
= -Е Ser — reni) 
п n 
= ly 1 Tn r а 
ЕС п n yn i" j sk. 


L2 B 


ER Nur йд кызу 


STANDARD ERRORS OF MOMENTS 207 


a result which is not, like (9.3), exact, but is an approximation to order n~}. To this 


order we have 


var (mj) = Elm, — и) 


= E(m,?) — u’ 
1 T ead 4 
а) a [Een — LX J = 22 
1 2r PN r T dm 2 ZTS) 
= 528 Zl) + Zle жг) + a8 gy 
TÈ -1 yT- 2r 41a r-1 2 я 
T 2020) ut cum P ay )r— д, jk. 


The expectations of other terms occurring in the squaring vanish, since they contain ду. 


és ae 1 3 
The expectation of А аи 227—2) is of order P and is thus to be neglected. The 
aU RE 3 


remaining terms give us 
var (m,) — EN — BP тра Ш, Nea д), К . (9.9) 

Similarly it appears that 
€ov(m,, то) = EOM — А.а F TIl: Hri Haa — Thri Katı — digi Ka-1) + (9.10) 


Example 9.1 

From (9.7) we have 
Ma 
v 

Now, for the height distribution of Table 1.7 we found (Examples 2.1 and 2.6) that 
m, = 67:46 л/т, = 2-57. Suppose we regard this distribution as a simple random sample 
from the adult male inhabitants of the United Kingdom living at the time when the data 
were collected. What can we say about the mean of the population ? 

The standard error of the mean depends on и. This is an unknown quantity, but we 
may, in accordance with the general principles of large sample theory, use m, instead. 
We then find 


var (mi) = 


, 2:57 2 
Standard error of m, = 78585 7 0-028 approximately. 
Thus we can say that the population mean probably lies in the range of twice this amount 
оп either side of the sample mean, i.e. in the range 67:46 + 0-056, and very probably in 
thrice the range, ie. 67-46 + 0-084. Our estimate of the mean would almost certainly be 
less than a tenth of an inch in error. 


Example 9.2 
From equation (9.9) with т = 4 we find 


1 
var (m4) = =(ше — uà — usps + 16и). 


208 STANDARD ERRORS 


In Chapter 11 we shall show h 
character and confirm that it is, in fact, exact to order 27}, 


Example 9.3 


To show that in samples from a symmetrical population the first product-moment 
between the mean and any mean-moment of even order vanishes to order 7}, 
We have, by definition 


Cov (mi, m,) = | [zen — eee | 


1 T 
= X(ar*i — (5,2 nI-1 
sl ( ОРТ) z (5? ж; ), 


the other terms vanishing, since the 


y involve the unit power of z, if we take an origin at 
the mean of the parent population 


› 


1 
= шиш а) 
Now if r is even, p, 41 and џи, 


x being moments of odd order, will vanish for a sym- 
metrical population and hence 


Cov (mi, m,) = 0. 
Tn the language of the theory of correlation ( 


Chapter 14) the mean and the even moment 
about the mean are uncorrelated to order љ-1 


Standard. Errors of Functions of Moments 


9.6. From the expressions we have just derived for the Sampling variances and 
covariances of moments, approximate expressions can be obtained for the sampling variances 


of functions of the moments. Suppose ф(т) is such a function. We have the functional 
relation in differences 


— 06 , дф " 
4d = js, 4 + Ат +... O(Am? . . . (9.11) 


Now any variations in m due to fluctuations of sampling are of order n~, 
mation, therefore, we may neglect the terms of order (Am 


is then seen to be a linear function of the variations Am and is equivalent to an equation 


in differentials ; that is to say, since the m’s are distributed normally in the limit, so will 
$ be.* We have, from (9.11), 


To our approxi- 
)? in (9.11), and the variation Ad 


дф 
(9) дт, (mj. 
Hence, measuring from the means of the m's and: 9, we find, squaring and taking mean v. 


var (d) — AGE AT (y) si x[?9 дф | } | 


5. a 00v (m 
làm; Im, (mj, т) 


alues, 


. (9.12) 
the first summation extending over all the m 
m/s such that j = k, 


's appearing in 4, and the second over all 
Similarly, for two functions $1 po; 


we have 
= yf: 99. ad, дф 
Cov ($i, ф») zs ang var (my dz MES d (т, 2 i . (9.13) 


* J. B. D. Dirkson (1939, Ann. Math, Sta 


in terms of what is called stochastic conver, 


ts., 10, 380) h i oa aes 
iu » 380) has considered tho TiBorisation of thís proceas 


ow to obtain this result by other methods of a more exact 


а 


STANDARD ERRORS OF FUNCTIONS OF MOMENTS : 209 


Example 9.4 
To find the sampling variance of the fourth cumulant. We have 


Ka = Ha — Зи 
йк, = du, — bud Me. 
Hence, squaring and taking mean values, 
var (ка) = var (ша) — 12и» соу (us, us) + 3603 var (дь). 
Making the appropriate substitutions from (9.9) and (9.10) we have 


1 2 2 
var (к) = PAUE — u$ + lôu? — 8изиз — 1?нз(%в — Hilta — 403) + 36и (ша — u3) } 


1 2 2 
= = {us — 1?двйв — 8дьйз — Hi + 48 аиа + 64u5u, — 3613). 


For a normal parent, ш; = 307, us = 150°, из = 10508 and we have 


24 
var (ка) = ae: 


Example 9.5 
To find the sampling variance of the coefficient of variation 
y = Wym, 
m; 
Taking logarithms and then differentials we have 
dV _ dm: dm, 


П 2m, mi 


Whence, squaring and taking mean values, 


var ту  lfíu,— ш _ ps Ma 
4m3 mam, mj? 


meer t соу (mymj) + 
s 2 * 2 T 7 
ys imi mam) } mj E 


To our order of approximation we may write y, = т, and find 


Уриа — M _ Hs , Hs 
n\ 4u Ua ^ uy 


var V = 

For a normal parent this gives (u, = 343, из = 0) ; 
V?/1 из pe РАДЫ 

V ыды йа 1 
bar n G i ts) i T i5) 
E qe йн 
= — approximately. 
m PP y 


9.7. On the above principles the standard errors of the more usual functions of 
moments, such as the measures of skewness and kurtosis, have been worked out and 
tabulated (see Tables for Statisticians and Biometricians and the references at the end of 
the chapter). 

In applications of results derived by the foregoing methods a few points are to be noted : 
Е (а) The sampling variances аге to be used only when the statistic under consideration 
18 calculated from the moments. For instance, if the standard deviation of a normal 


Curve is estimated by taking J б) times ће mean deviation of the sample, instead of 


A.8.—VOL. I. P 


210 STANDARD ERRORS 


the more usual root-mean-square, the formula var (c) = (Ha = ui) 


КЫ уе from (9.9) is 
4n us 


not applicable (see below, 9.11). 
(b) From (9.4) and (9.9) it will be seen that the sampling variance of a moment depends 
on the moment of twice the order, i.e. becomes very large for higher moments, even when 
т is large. This is the reason why such moments have very limited practical application. 
(c) Some measures calculated from the moments tend to normality very slowly. 
МВ, or b, (the sample values of УВ: ог В.) are cases in point, and more refined methods 
which we discuss in Chapter 11 are preferable to the use of the standard error. 

(d) The order of the approximation makes it necessary to exercise care in the neigh- 
bourhood of vanishing values of standard errors. For instance, if the coefficient of variation 
V — 0 in a sample, the formula of Example 9.5 would give var V = 0. But it does not, 
of course, follow that there is no variation at all in the population, though none exists in 
the sample and the presence of variation in the parent will be unlikely if the sample is at 

72 
all large. When V = 0 the quantities neglected in our approximation giving var V = Е 
become of some relative importance, though they are still small. 

(e) It is interesting to compare the sampling fluctuations, as expressed in the sampling 
variance, with Sheppard's corrections to the moments. Writing temporarily s? for the 
uncorrected variance in the sample, 55 for the corrected variance, we have 

ra 1A 
(DIU 12 s2* 
where № is the interval width. For many practical cases, if d is the number of interv: 


als, 
dh is about equal to 6s,, and thus 


#213 
si d? 
Sa 1 3 à imate] 
i 512 proximately, 
For a normal population we have 
i с = Vio 
1 
doped 
2V la Ha 
and hence : таго = var из 
En 
Ша — u$ 
4ugn 
c C 
2n 95 


Thus if n is, Say, 1000, the standard error 
Sheppard’s correction in a case where d — 
a sixth of the standard error, Tt i 
than 1000, in order to avoid s 


interpreted as implying a high iability i 
т plying a higher degree of reliability in the corrected valu 


STANDARD ERRORS OF BIVARIATE MOMENTS 211 


Standard Error of Bivariate Moments 

9.8. Extensions of the above formulae to the bivariate case are made without difficulty, 
only slightly more complicated algebra being involved. The reader will be able to verify 
the following formulae for himself: 


; TUN a 
var (m, s) = "(Mass — Mrs) © + - . . . Bb 3 (HES) 
4 Я Т Е, 
cov (т, вз My, v) = (е-и, stv — Hr,s Ши, 3 . (9.15) 
1 2 „2 ° ° ° 
var (my, s) = —(Из„,% — Hers c T, o Wri, s + S?o, 2 АЗ, аа 
+ 278 1 1,8 Hr, si — 27 Шел, s 1,8 — 28И, s41 Hr s 1) — . (9.10) 


1 
COV (M,, ss My, o) = perto sto — Hr, s Hu, ө F 70120 Hr—i, s Hu-1, о 
kt SUL s. Hr, s—1 Hu, v—1 T TUM, 1 Hp, s Muje—1, 
+ Su, 1H, s-1 Нил, — UV ys s Hull v 
— Ult, s+1 Hy, it-1 — TH, s Muti,» — Sly, s—1 Hu, v41) LI (9.17) 
Example 9.6 
The coefficient of correlation is defined by 


m 
f= m 
V (Mamos) 
dr dm dm, dm 
We have = penn pe 
T Mii “ Mag ^ Moa 
1 var (m var (Ma COV (mj, ms SA. 
Thus = var (r) = ic п) rFi— ( 20) (mir т) + similar terms, 
rA mi M3 з тууту 


from which, substituting appropriate values from (9.16) and (9.17) and writing Hus for 
™, in the result, we have 
var (o) = P Ee q pe q plat p gle _ йы _ да ) 
"NUM “зо Шо Шзооз [14H29 131/03, 
р being the same function of the y’s as r is of the m’s. For the bivariate normal distribution 
the substitution of values of Example 3.15 gives 


var (r) = 1 (1 — р?)2, 


The use of the standard error to test the significance of the correlation coefficient is 
not, however, to be recommended. t 


Standard Errors of Quantiles 

9.9. Among the various quantities measuring location and dispersion which we 
considered in Chapter 2 there was one group, namely the quantiles, which are not algebraic 
functions of the observations and whose sampling variances cannot accordingly be deter- 
mined by the above methods. We proceed to consider them now. 

Suppose the parent distribution is represented by dF (x)= f(x) dx. The probability that, 
of a sample of n, (J — 1) fall below a value жу, one falls in the range vı + { dz, and tho 
remaining (n — 1) fall above 2, is proportional to 

P(e)? fle) da, (1 — P(e)! = Fy — nude ae А - (9,18) 


212 STANDARD ERRORS 
where Р, = F(v,). This expression is accordingly the distribution function of x, the 


1 _ : 
member of the sample below which a proportion Р of the members fall, i.e. the ЊЬ quantile. 


Put l=ng 
so that т —lL=n(1 —q) 
3 = np, say. 


The distribution (9.18) has a modal value given by ditferentiating the frequency function with 
respect to д, ie. (taking logarithms first) by 


Pepe Zp ПАКИ 2 + (9.19 

ino ri tg oe 

this equation being satisfied by the modal value 3. Now for large n, the factor f will in 
1 


general be small compared with the other terms in (9.19), Land n — 1 being large. We may 
therefore neglect it, and (9.19) becomes, to order 1^1, 


or F(é) = д. 
This is in accordance with our general assumptions. To order n-1 the quantile of the sample 
is the quantile of the parent. x 


Now let us investigate the distribution (9.19) in the neighbourhood of the modal value. 
Put 


Р. = а +. 
(9.18) becomes (neglecting constants) 


(q + (p — Бут, 
Taking logarithms and expanding we have, except for constants, 


É g 
ng tog (1 +2) + np lo, ( — 5) 
d p 10g p 
г eons o Ci 
ni (^- а) о om E 
= — 2" + terms of order £* and hi і 
= Gee ms of order ё and higher degree in £, 


Now for large samples £ will be small compared with q, 
order. Thus the distribution of é is 


— né? 
dF oc exp GE je 
2pq 
or, evaluating the necessary constant by integration, 


1 -— WOES 
EP 282гр (= REN, 
ЭТ, a 
n 


showing that & is in the limit distributed normally with variance 


var (é) = 24 
n 


and we neglect the terms of higher | 


E 


STANDARD ERRORS OF QUANTILES 213 


This is the variance of £, which is a proportion. To find the variance of x, we note that 
d£ = аР, = fidt, and hence that 
PY 


= - var (xı) = == А . . : ‚ (9.22) 


ДА 


In practice this formula is often applied to grouped frequency-distributions, and in such 
applications it is to be remembered that fı the ordinate of the parent, is to be taken as the 
frequency per unit interval at ху, this being the best estimate of the ordinate, 


Example 9.7 


If x, is the median, p = = 3 and we have var (median) = um 
1 
where f, is the median ordinate. For instance, if the parent population is normal, the median 
1 
ordinate is (from Appendix Table 1) = 0-39894, c? being the variance of the parent. Hence 


the standard error of the median is 


с 1 
/n'2 x 0:39894 
c 
= 1-2533—_, 
m Мт 
The standard error of the mean in samples of n from a normal population is v which 


is thus considerably smaller than the standard error of the median. 


9.10. To find the covariance of two quantiles we generalise equation (9.18). If we 
have a random sample of n individuals the probability that (l — 1) lie below ту, one lies at 
v, + dda,, (n — 1 — т) lie between 2, and 2», one at Xa + ido, and the remaining (m — 1) 
above 2, is 


dF ос Р-Р, — F,)--™(1 — F,)"1dF, аР, . 5 ‚ (9.23) 
where F, = F(2,), P, = F(22). 
We put = 
т = pn 


and find, for the equations giving the modal values corresponding to (9.19), 
Gi d). Gg 


JT. Fy — 2; 
02 — 91 Pe 


Р.Р, ТК, 
giving, for the limiting modal values, 


Е(&) = a 
Po * + 9» = 2 ie 
The conditions as to the relative smallness of га are satisfied in any ordinary case. Now 
put 
Fi= q +é 


F, = qa + Èa 


214 STANDARD ERRORS 


The joint distribution of £, and £ then becomes 


dF cc (q, + £y" (q. — а, + £ — ée пр, — &) "®4&, d£. 


On proceeding as in the previous section, taking logarithms, expanding and neglecting terms 
in é and higher, we find ultimately 
n 92.» Dio 
GF ccexpd— ——" [Ese E È rst) 5 d£,. . . (9.25 
p{ 2(G2 — ale ў : Pz at ( ) 
Thus the joint distribution of £, and £, tends to the bivariate normal form, and on comparing 
(9.25) with the canonical form (Example 3.15) we see that 


1 M. Ts 
(1— p?) var (£i) (9: — qi) 
1 np; 


(1 р?) var (&) (4: — qp: 
p* = p TEE 
(1 — р?) cov (6:52) (1 — p?) {var (&,) var (&)} (ба — a 
whence it is easy to find 


var (&,) = Pit 


T "d 


var (£j) = pas 


== раї 
cov (£i, a) = Te : . . : . + (9.26) 
The asymmetry of the result for the covariance is due to the fact tha 


t p, relates necessari 
to the wpper quantile. For the corresponding expression in a, машу 


and z, we have 
Poh 
COV (Xr va) = +24! 4 , 
j fof мш P 27 (0:27) 
` With equations (9.26) and (9.27) we can find expressions for t] i . 
range and similar statistics. 19 variances of the quantile 


Example 9.8 Я 
The variance of the difference 8 of two quantiles at 2, and z, is given b 
dé = dx, — da», жЕ 
var (ô) = var (ху) + var (£) — 2 cov (21, a») 2 
= a +22: _ 2p 
When the quantiles are th : 5 ^ Са 
qu e the two quartiles, р, = 4—i»lp-g,-—s A à 

of the semi-interquartile range we have E send д Ea 


үл ашу = 4/3, 8 2 ) 


6n B FF 
where f,, fa, are the frequencies rac 1 Je Sify 
Ё j per unit interval at 3 
quartile. As f, f, have to be estimate JA Ta we m Es to the upper 
д ay also write 
var (s.i.q.) = ial ay Ce ar 
бп gi 98 1:92)” 


where g;, 7, are the actual sample frequencies at the quar 


tiles and ¢ is the sample variance. 


STANDARD ERROR OF THE MEAN DEVIATION 215 


For instance, if the parent distribution is normal, gı = 9: and we find 
З g? 
var (s.i.q.) = lóngk 
From the tables the deviate corresponding to the quartile is 0.6745 and the ordinate at this 
point, gı = 0-3178, so that the standard error of the semi-interquartile range is - 


с 
/n.t x 0:3178 


9.11. In amplification of the point mentioned in 9.7 (a) it is worth while stressing 
again the fact that a standard error is related to the way in which a parameter is estimated. 
For instance, the standard deviation of a normal curve can be estimated from a sample in 


severalways: from the second moment ; by taking ү (5) times the mean deviation ; by 


1 
0.6745 


taking times the semi-interquartile range; and so on. Each method will have 


its appropriate standard error, that for the first, for example, being ES and that for 


( 
1-64950 ПЕ еч M ns 
prc At а later stage considerations such as this will lead us to the inquiry, 
№ (27. 
what is the estimate, if any, with the minimum sampling variance ? For present purposes 
it is enough to note the importance of not using a quoted formula without reference to the 
method of estimation of the parameter concerned. 


the third 


9.12. The methods we have developed provide the standard errors for large samples 
of most of the measures of location and dispersion and the measures introduced in Chapters 2 
and 3. There remain a few on which we have not yet touched, viz. the mean deviation, 
Gini's coefficient of mean difference, and the range. We consider them briefly in turn. 


Standard Error of the’ Mean Deviation 

9.13. The mean deviation, as was pointed out in Chapter 2, is relatively speaking 
à complicated function, and the mathematical difficulties attendant on absolute values are 
well illustrated in discussions of its sampling variance. In fact, no general discussion of the 
sampling distribution appears to have been undertaken. The following exact value of the 
sampling variance in samples from a normal population was discovered by Helmert in 1876 
and rediscovered by Fisher in 1920. 


2 (n — Dn b sin} 
var (m.d.) = = ^ (5 + v {n(n — 2)} —n + sin a = ) 


2 
~ AG — =) for large n. . . + ^ . . (9.28) 
n л 


The proof follows the general methods described in the next chapter. It is quoted here 
for the sake of completeness.* 


n = The distribution function of the mean deviation in normal samples from 2 to 10 has been tabulated 
у Godwin and Hartley in Biometrika (1945), 33, 254. 


4 


216 STANDARD ERRORS 
Standard Error of the Mean Difference 


9.14. Nair (1936) has given a general expression for the standard error of Gini’s mean 
difference without repetition. In the manner of 2.24 it is easy to see that the co- 
efficient may be written 


2 


= = . (9.29 
Aa = am —10U (--1V) ., 5 P (9.29) 
where U = > (jx) 


y- s 
7=1 


and we write n in preference to N for the number of observations, since we are dealing 
with a sample. . 


In our usual notation, the probability that the jth observation in order of magnitude 
in a series of n observations has value in the range x + 102 is „ 
z n! 
(j — 1)(% — 3) 
Hence the mean value of U is given by 


E(U) — ld Meg Ке gig — Fryar 


7=1 


FJ. — PFjy-idr, 


= 5 e ж 
= 2 arte ©, pea p ) 


=a ‚ ae) р nj 
Jr e ur 


— д (n = 2)! = n= 
T == — 27] 
= "|. зар {1 + (n — 10) " s - (9.30) 
Similarly 


E) = nar. : PE а. а) 
Thus 


2 
Е(4,) = nn — 1) 220) — (n4 DE(V)) 
E зар — 1) dF. 
In the same way (but we omit the details) Nair finds 


(4?) on sh mU А (9.32) 


i 


STANDARD ERROR OF THE MEAN DIFFERENCE 217 


where 


= | x*{(n — 1) — 4(n — 2)Ё + 4(n — 2)F2}dF 


h-[ ж ар." аара —3) — 20 — 5\F, — 2n — DF. 4n — 95. Fa) 
and finally 
var (Ду) = E(41) — (E(4))*. : : - . (9.33) _ 


For three particular cases these integrals are worked out, giving : 


Normal Parent : 


=x 


dF = ——— 2! da, — о <t < oo 
ov 2x 
2g 
EA) = 7 
ө 4g? n-4-1,2(n—2)3 2(2n — 3) 
= . (9.34 
gas ae cs mt gr E E (0:94) 


с=2(08068)}8 so 2 4. x. О 
n 


Exponential Parent : 


dre eod 0 «x «oo 
с 
Ж Иў=зё@ э x о x ы ыс ОЕ (0086) 
5 2(2n — 1 
var (4) = бук: со МЕ ООЛУ 
5 г 
~ 0, " T . (9.38 
Ed ( ` ) 
Rectangular Parent : 
аР =" de 0 <= < 
ЖА =з гор т, Ж ge oer 
k %--8 
r (4) = —.——_. . " š ‘ : REF 
TORES) 9 n(n — 1) (9.40) 
k? 
as . è 1 s : 5 . (9.41) 


9.15. We now turn to consider some statistics which are peculiar in more ways than 
one—the extreme values of a sample (or, more generally, the mth value from the top or the 
bottom of a sample) and the range. One of the unusual features of the distribution of mth 
values is that as n increases it diverges more and more from normality ; and it seems doubtful 
whether the distribution of range tends to any limit at all—certainly it does not tend to 
Normality in all cases. 


A further difference between the quantities we are now considering and the others we have 


218 STANDARD ERRORS 


already discussed is that mth values and range in the sample are not used to estimate mth 
values and range in the population. In fact most of the results we shall obtain relate to 


parents which have an infinite range. What, then, is the use of these statistics ? The 


answer is that they may provide an estimate of parent parameters which do exist. For 
instance, an estimate of the variance of a normal population is given by dividing the sample 
range w by a constant d, depending on the number in the sample. This estimate, though 
not so accurate as some (in the sense that its sampling variance is not so small), is extremely 
easy to calculate and is often useful. We wish, therefore, to know its sampling variance, 
that is to say the sampling variance of the range. 


Distribution of mth Values 


9.16. We consider first of all the distribution of mth values from the top, that for 
mth values from the bottom being similar. In particular, m may be unity, in which case we 
get the greatest member of a sample. 


Quantiles are special cases of this class of statistic, the ratio m/n remaining finite as » 


tends to infinity. In the case we now discuss m remains finite, so that the ratio m JA n tends to 
Zero. ў 


The distribution of mth values from the top is, as in equation (9.18), 
dF oc Frm — F,)m-1 gp, 5 5 . (9.42) 


When the form of F, is known, this equation is sometimes capable of exact solution, a 


s in the 
following example. 


Example 9.9 


Consider the rectangular distribution dF = dx, 0 <z <1. Here F(x) = х and the 
distribution of the mth value from the top is 


аР oc a?-"(1 — ym- dy 
the Pearson Type I curve, We have, for the first moment, 


VACARE _ т, 
1 n+l п 4-1 
and for the variance 
pas m(n — m — 1) 


(п + 1)2(n 2)’ 
but this sampling variance cannot be used in the ordinary way if m is finite, for the curve 
However, we may easily obtain exact values for the probabili- 


values, from the integrals of the Type I curve. In fact, the 


ue will not be attained or exceeded is, in the usual notation 
L(n —m + 1, т). я 


9.17. From this point our discussion of the limiting 
is confined to the case wherein the parent population is a 
of unlimited range of the exponential type, i.e. such that 
d 


d; 98 f(x) | exceeds some fixed number as xv 
tends to infinity. The normal curve obeys this 


criterion, which im lies, am. i 
that all moments exist. у Eie 


form of (9.42) as n tends to infinity 
continuous frequency-distribution 


it tends to zero with large x as fast 
as or faster than dF = e-lzl dz, and that 


> 


DISTRIBUTION ОЕ mTH VALUES 219 


For the mode of (9.42) we have, as in (9.19), 

my — Mh fio, 

i 1 KER 1 f 1 

For large n and finite m the mode x, will be a large value, and both f, and 1 — F, tend to zero. 
Accordingly we may put 


(n 


in accordance with the rule known as L’Hopital’s. Hence 


Fé) =1——. 
Now expand ie in the neighbourhood of # by Taylor's theorem. We get 
F(x) = Fa) + ДИЕ — 
m Mix x (x => 
-1-Z2- Te 3f) ay UD)? + 


(the last term in virtue of fi ~ me == E in the neighbourhood of the mode) 


n 


e ш exp Na (= — 5) feb) approximately E ` . (9.43) 
n 


m 
= 1 — — 6-7", say. . . . . . . 
7, 


The distribution of the mth value from the top may be written, from (9.42), 


1 т—1 
ак ec (s " 1) a(R"). 


ew ye pe m 
To our approximation, from (9.44), since ate small, 


(s p" y d ( 1 Т» г 
Г m “Unm 
1—-—e 


"n 


m-1 
mh”) em- 
n 


n—1 
and dm) = (1 e zen) P ov, dy, 
n n 
Thus dF,, ос exp (— my,, — me "") dY m 
and, on evaluating the constants by integration, we get 


m" 


ae m (— Mym — те-")йу. — . . ‚ (9.45) 


220 STANDARD ERRORS 
The new variable y,, is defined in terms of 2 — by 
„уп 


Ín a similar way, for the mth value from the bottom we find 
m 


e — тету 4 
dF = a Whe (m, — тети) dy, А . + (9.46) 
ny being written for the variable defined by 
j m 
F = —ew, 


n 
In particular, for the extremes (m =1) we have 
dF = exp (— y — е-и) dy (top value) . п E + (9.47) 
dF = exp (y — e) dy (bottom value) . . p . (9.48) 


9.18. These unvsual limiting forms, which are due to Gumbel (1934), the extreme cases 
being due to Fisher and Tippett (1928), are very far from normal for moderate or low values 
of m. For the moments of (9.45) we have (omitting the suffix of y for convenience) 


nay TUN ge eme. 
dc P 
Роб = CELL We get 
m 


^ 1 4 
= — m—1 o—t 
A ( x (log — log m)t еа 


а 
— log m + Im uog Г (m)} 


"Er 
= dem y= D(A) EM EV uL S (бм) 


т=1 
For the rth moment about the mean we have 


where y is Euler's constant. 


д, = CERA j (log t — log m + ui)t—1 e-t dt 


= (= 1)" а” (u’;—log m) 
— p iz Em Tim + o], s . - (9.50) 
These formulae have been worked out further by Gumbel, fro 


| m whose numerical results the 
following are chosen :— 


m Mean В, В. —3 
1 0:577 1-139 2-400 
3 0-176 0-621 0-763 
5 0:103 0:468 0:437 

10 0:051 0:324 0-212 


These figures, which relate to the distribution from the top, show clearly that the limitin 
distribution is far from normal. The distribution from the bottom is similar, odd Xia 
including the mean having the same magnitude but opposite sign, even moments being the 
same, 


DISTRIBUTION OF mTH VALUES 221 


Moreover, the limiting forms (9.45) and (9.46) are reached extremely slowly. Fisher and 
Tippett (1928) have shown in the case m = 1 that they do not provide a very satisfactory 
approximation for values of n less than 10!*. For practical purposes, therefore, there is still 
no adequate general approximate form for the. distribution of mth values. 


9.19. The case m = 1, corresponding to the extremes of the sample, has, however, 
been studied in more detail. In this case equation (9.42) becomes 


= d n " 
dF = ia dà. 


By using the published tables of the normal integral F;, Tippett (1925) has evaluated Р for 
values of » up to 1000, and given diagrams yielding the variances, and f, and f., which are 
reproduced in Tables for Statisticians and Biometricians, Part II. The following values are 
quoted from his results :— 


n Mean Standard Deviation By Ps 
2 0-564 0-826 0-019 3-062 
5 1163 0-669 0-092 3-202 
10 1:539 0:587 0-168 3:331 Y 
100 2-508 0-429 0:429 3-765 
500 3-037 0-370 0:570 4-003 
1000 3-241 0-351 0-618 4-088 


The values of f, and f, illustrate the point that as n increases, the distribution of the extreme 
value diverges more and more from the normal form. 

The limiting values as n —> co can be derived by the use of characteristic functions. 
In fact, we have for the distribution of the top value, 


ф@ = i et exp (— ж — €?) dz, 


which, on substituting e~* = £, gives 

$(t = |i -it e~? dt 

о 
= Г) 
Непсө 
„окуй? r 
«,(it) 4 x +... = log ġ(t) = log I'(1 — it) 

= yi) + Šin? + San? ER 

Thus Kı = I =} 


zi 
Ky = р. = 8, = * = 1:644934 
из = 28, = 2-404114 
“92 _ т 205 
из — 39$ = 6S, = XP^ 6-493939 
whence f, = 1-299 
By = 5-4. 
cll 
* Cf. Edwards, Integral Calculus, vol. 2, article 916. S, here is i 


n=1 


222 STANDARD ERRORS 


These are evidently far from equal to the values for n = 1000 given above. Clearly the 
limiting form is an inadequate approximation for values of x much higher than 1000. 


9.20. The problem of bridging the -gap between Tippett’s values and the limiting 
form has been considered by Fisher and Tippett (1928), and the argument which they 
employ is interesting. Concentrating for a moment on the upper value, we note that the 
upper member of a sample of kn members is the upper member of a sample of X of upper 
members of samples of ». Both distributions will tend to the same limiting form, if it 
exists ; and consequently the limiting value must be such that the extreme member of a 
sample of x from it must itself have that distribution. That is to say, if F is the prob- 
ability of an observation being less than c, 


F(x) = F(a,z +6,), . ‘ $ 5 o (9.51) 
where a, and b, are functions of n. i 
It may be shown from this equation that F must be one of three forms :— 


dF = exp (— = — e-?)dx 4 А : А : . (9.52) 
4 dF = E exp(—a-“)dx . s : : ч .. (9.53) 
dF = A(— х)4-1 exp { — (— =)4 = . : Р . (9.54) 


The first we have already reached. The second and third arise if the original distribution, 
instead of tending to infinity exponentially, tends less rapidly such that 


lim (1 — F)z^ exists and is not zero. 
2—0 
The distribution (9.54) itself has (9.52) as a limiting form as A tends to infinity. It 
has therefore been proposed as a “ penultimate ” form, to bridge the gap between » = 1000 


and 7 = 1012, which is apparently the first point at which the ultimate form provides a 
reasonable approximation. For the penultimate form we have 


i.m | 4(— 2) re a 
= 1 
and on putting — y — (A 
p f (— 1y á e-tdt 
Jud 
= (— 1)" БЕ, S 
(-ur(i+ 4). 


The following values illustrate the relationship b 
7 p between the known fo = 
and the penultimate form for two convenient values of A: E Weide 


| 
Standard Deviati | 
| { | viation Bı Ba 
т ы] 
| Penultimate | Actual Penultimate| Actual Penultimate| Actual 
0-0768 | 1000 0-3433 0-3514 0:548 . 
0-0845 | 500 0-3604 0-3704 0-498 б i 4008 


р 


DISTRIBUTION OF RANGE 223 


Distribution of Range 
9.21. The range is the difference of the highest and the lowest value of a sample, and 
the simultaneous distribution of top and bottom values is, from (9.23), 


dF œ (Е, — F;y-? dF, ар, 
= n(n — (Р, — F4"? аР, dF.. 5 5 . (9.55) 


The distribution function of the range w is then given by integrating this distribution over 
values of F, and F, such that z, — zi < W. So far as I am aware, it is not known whether 
limiting forms of this distribution exist or what they are. It is, however, evident that 
for large n the range is also large, and it seems doubtful whether the difference of two 
variates which (for an unlimited curve) tend to + со and — co respectively has any general 
limiting form. In any case one would suspect that the limiting form is reached slowly. 

For particular cases equation (9.55) is soluble explicitly. The normal case has been 
fairly completely studied by Tippett (1925) and E. S. Pearson (1926 and 1932). Tippett 
found the first four moments of the distribution of the range, tabulated the mean values 
for values of n up to 1000 and gave a diagram for determining standard errors. (These 
tables and diagram are reproduced in Tables for Statisticians and Biometricians, Part П.) 
Briefly, his approach is as follows :— 

From (9.55) we have, for the mean range E(w), 


E(w) = n(n — nf ar" (PF, — Е.) а — х) dF« . . (9.56) 


On expanding (F, — F,)"~* we get terms under the second integral sign like 


Ti а. — 2 „5+ї 3i 1 T. 
MEC — v) ағ. = [e = za : js + jl. FS de, 


| I FS dx, = сеа вау. 


PEN +1 
: NS — s Я з 
Тһеп - E(w) ig m T 2 E x; Fy-21-5 0541 GF}. 
S=0 : Е a 
oo ‚ К т-1-57]®° oo : 

But Í Р^-%-5 Usp dF, = Е чв z] ES = = af Fes RS da, 
Hence 

n—2 

— 1)8 2 
ви) ^ У 5 = Ы = xl (1 — Faci-5)FSH da, 
5=0 s MA sed 


-f aoa sre} aln ЖЕ qe 


= 
Tn a similar way it is found that 
= 


Var (u) = ej É a А — (1 — Р)" — (F, — F." } dz, dz, 


— {E(w)} . . : З s . (9.58) 


224 STANDARD ERRORS 


This equation was used by Tippett to obtain values of the variance for n up to 1000. The 
following values illustrate the general behaviour of the distribution :— 1 


x 2 
n Standard Deviation (approximate) (approximate) 


2 0-853 0-99 3-87 

10 0-797 0-16 3-15 
100 0-605 0-21 3-38 
500 0-524 0-29 3-50 
1000 0-497 0-31 3:54 


Again it would appear that as n increases, the distribution of rang 
from the normal form. 

The distribution function of the range in norm. 
by E. S. Pearson and Hartley (1942). 


e diverges more and more 


al samples has recently been tabulated 


List of Standard. Errors of Commonly Occurring Statistics 


9.22. In view of the general utility of the standard error it may be convenient to 
bring together at this point for reference a number of sampling variances and other results. 
Some of these have already been obtained in this chapter; others are direct consequences 


of the formulae or methods developed ; and some will be proved later in the book. 

Mean. var (mj) =" = Z where ø is the standard deviation of the parent. This 
is true in particular for a normal parent. 
of the sample. 


= 


The mean is always estimated from the mean 


жуй 
Variance. var (ms) BATTA For the normal parent var (Mma) = 25 Tables 
n 


are given for this case in T.S.B. I*. These results are a 
variance is estimated from the sample variance. 
see Davies and E. S. Pearson (1934). 
Standard Deviation. var (s) = (иа — 08) 
4пиз 
square root of the sample variance. 


ppropriate to the case where the 
For numerical results for other cases 


2 
For normal parent var (s) = = These 
2n, 
are the values for estimates from the 


à See previous 
note on variance. 


Third Moment about the Mean. var (т) = (ps = HE — билна + 9). For normal ' / 
n 


бо Я Я 
parent var (m,) = a The third and higher moments are always estimated from the 


moments of the sample. 


Fourth Moment about the Mean. var (m,) = (Ha — i$ = Визи, + 1642445) 
n : 2 
960% 


For 


normal parent var (m,) = 
> тен Ve/ üa ш Ha Из 
TEE of Variation. var (V) А ae F m zar). For normal parent 
var (V) = gp 2PProximately. Tables given in T.S.B. I. 


* An abbreviation for Tables for Statisticians and Biometricians, Part I ‘ | 


STANDARD ERRORS OF COMMONLY OCCURRING STATISTICS 225 


B. var b = Pala 7 280s ok НЕВРОН О) For normal parent 


т 
2 6 
var b, — zx Tables given in T.S.B. I. The distribution is fairly skew for moderately 


large n and the methods of Chapter 11 provide better tests of f, as a measure of departure 
from normality, See 11.23. (The f's are defined in equation (3.65).) 


(Bo — 48.8, + 4% — = 168,8, — 88, + 1661) Tables given in 


By var by = 


eB I. 

Pearson Measure of Skewness (Equation (3.64)). Tables given in T.S.B. I. Probably 
skew for moderate n. See note on f. 

Pearson Mode, Formulae and tables given in Yasukawa (1926), the results of course 
being only applicable to modes calculated from the Pearson formula (equation (3.62)). 
Distribution may be skew for moderate n. 

Coefficient of Contingency. See 13.14. 

Coefficient of Association. See 13.8. 

Tetrachoric т. See 14.28. . 

Mean Deviation. General formulae not known. See 9.13. For normal parent 

* var (m.d.) = (1 — 3 


n л 
(0:8068) 202 


Ginis Mean Difference. See 9.14. For normal case var (4) = 2 


Median. var (т,) = where y, is the median ordinate of the sample. For 


Anyo? 
(1-2533)202 
т Р 
and formulae given in Hojo (1931). Results to higher order in n given by К. Pearson (1931). 
Зо? А $ ч 
Quartiles. var (Q) = ay where y is ordinate at the quartile concerned. For normal 
(1:3626)252 Y RA 
parent, var (Q) — TE = Results for small samples from normal population given in 
Hojo (1931). 
Е Semi-interquartile ran var (s.i.q.) a he 1 rhi 1 
А emi-interquartile range. s.i.q. ae Т ТӨ tran where Yı, Ya are the 
r (0-7867) 0° 


) 
—--^ quartile ordinates. For normal parent var (s.i.q.) = ЖООЛУ ТО 


normal parent var (m,) = For small samples from normal population, tables 


Deciles. For the.normal parent, variances are 
80)292 


for deciles 4, 6 1186807303 
т 


s 24,9 
в, т (1:3180)%o 


т 


” ” 


m 2562 
э, в (14288) 
n 


5 2452 
1, 9 (70995 
т 


A.S.—VOL, I. Q 


226 i STANDARD ERRORS 
Range. See 9.21. 


7 i 1-595212 3 
Correlation Coefficient. See 14.10. For normal case var = EE But it 
is better to use Fisher's transformation (14.18) or the Tables by David (1938). 


2 f= pe 
Coefficient of Regression. See 14.10 and 14.11. For normal case var (0) = а a ) 
2 


Standard Errors of Sums and Differences 
9.23. Suppose we have two variables 21, 2,, Which may or may not be independent. 
We have, if z is their sum, 
E) = E(x) + E (4), 
or the mean of z is the sum of the means of x, апа z, If then we measure жу and 2, about 
their respective means, the mean of z is zero and thus 
varz = E(z?) = E(x, + 2)? : 
= E(x?) + 2E(2;) + Ele?) 
= var x, + 2 cov (t1, Xa) + vara, . А . (9.59) 
Similarly for the difference of two variables we have 


var 2 = var х, — 2 cov (ту, ta) + var x,. A . (9.60) 


In particular, if v, and x, are independent their covariance vanishes, for it becomes 
the product of the two means, each of which is zero. In this important case we have, for 
the sum, 
var (жу + £) = var x, + var a, 5 o P . (9.61) 
and for the difference 


var (xı — 2) = var x, + var s. О T a . (9.62) 


These results are of fundamental importance : 


the variance of the sum or difference 
of two independent random variables is the sum o 


f their variances, Generally if 
2-—üu--ax--... а 
and the » variables are independent, 


Var z = aj Var æ, + a? var v, p... аў var 3,. . 5 . (9.63) 
In particular we have, for the sampling variance of the difference of the means of two 
independent samples, say m, and p, ` 
‚ ; l: De 
var (m, — Soy. E 3 
(m, — pi) Y s - (9.64) 


да and v, being the respective variances and n, n, the respective numbers in the samples. 


Example 9.10 


A random sample of 1,000 men from the North of En 
to be 47 shillings a week with a standard deviation of 28 shillings. A random sample of 
1,500 men from the South gives a mean wage of 49 shillings a week with a standard de- 


viation of 40 shillings. Required to discuss the question whether the mean 1 1 
differs between North and South. Po ^ud 


gland shows their mean wage 


The difference of the means is 2 shillings and we wish to know whether this is significant. 


STANDARD ERRORS OF SUMS AND DIFFERENCES 227 


From (9.64), taking as usual with large samples the unknown variances to be those of the 
samples, we find 


i 28? 40? z 
var (difference) — 1000 + тте 1:851. 


The standard error is thus 1-36 and the difference in means, being less than twice this 
amount, is hardly significant of any real difference. Had the difference been three shillings 
instead of two we should probably have concluded that the difference, being more than 
twice the standard error, was significant. 

There is an alternative approach to this problem which is worth noticing. Suppose 
We assume as our hypothesis under test that the distribution of wages in the two areas 
is the same. The difference in the variances makes this rather unlikely, but on that 
assumption we may combine the sample figures to give a new estimate of the mean and 
variance in this distribution, e.g. the mean might be taken to be given by 1 


(1000 x 47) + (1500 x 49) 
2500 
— 48:2 shillings. 


In the first sample the sum of squares of deviations about the mean 47 is 
. 1000 x 282 = 784,000, "à 
and hence the sum about the origin is 784,000 + (47? x 1000) = 2,993,000. Similarly: 
in the second sample the sum of squares of deviations about the origin is * 
1500 (40° + 492) = 6,001,500. 
fe e. 8,994,500 E 

The second moment of the whole about: the origin is then 73800 = 3597-8, and hence 
the variance is 3597-8 — (48:2)? = 1274-56. We might take this as our estimate of the 
variance in the population and our problem would then be: does the mean in one of the 
parts of the whole sample, say the first, 47 shillings, differ significantly from the mean 
of the whole, 48-2 shillings ? 

Now at first sight it looks as if this is a case for the application of (9.64). We have 
two means, 47 and 48-2, with respective variances 784 and 1274-56, and require to know 
whether the means are significantly different. But the samples are no longer independent, 
for one of them is part of the other, and a modified formula must be used. If the means of 


, ; 1 , 1 
the separate samples аге "(= аба) апа »(- 52) the mean of the two together 
1 2 


is given by 
nym, + nop, Se, + Te 
1 арр __ 17.2035 
Ny + Ng Ni F n: 


The difference of m; and this quantity, say q, is then 
Пи. Ax, + Хх, 


Es 
7 Ny Ny + Ny 
\! Ne 
E — 2x, — Xe. 
Ny F т (m, 


Thus Eq) = 1 [na = nai) = 0, 


т + т, (My 


228 STANDARD ERRORS 


and hence 


1 No Ч 
— 2) — 2 yx Sab. 
var q = E(q?) B x d zs) 


Since z, and z, are independent, this reduces to 


1 Е тиз + aus) 


(nı + na)? ni 


He 
(omn, 4- nj F 


In our case n, = 1000, n, = 1500 and our estimate of Из is 1274-56. The variance of the 
difference then becomes, on substitution, 0-7647. The observed difference is 48.9 — 47 = 1-2, 
Once again this is less than twice the standard error (= V:7647 = 0-87) and again we 
conclude that the difference is not significant. 


REFERENCES 


Davies, O. L., and Pearson, E. S. (1934), “ Methods of estimating from samples the popula- 
tion standard deviation,” Supp. Jour. Roy. Stat. Soc., 1, 76. 

Fisher, R. A. (1920), * A mathematical examination of the methods of determining the 
accuracy of an observation by the mean error and the mean square error," Monthly 
Notices Roy. Astr. Soc., 80, 758. 

—— and Tippett, L. H. C. (1928), ** Limiting forms of the frequency distribution of the 
largest or smallest member of a sample," Proc. Camb. Phil. Soc., 24, 180. 

Gumbel, E. J. (1934), “ Les valeurs extrémes des distributions Statistiques," Annales de 
l'Institut Henri Poincaré, 5, 115. 

Hartley, H. O. (1942), “ The range in normal samples,” Biometrika, 32, 334. 

Helmert (1876), Astronomische Nachrichten, 88, No. 2096. 

Hojo, T. (1931), * Distribution of the Median, Quartiles and Interquartile distance in 
samples from a normal population," Biometrika, 23, 315. 

Kondo, T. (1929), “ On the standard error of the mean Square contingency," Biometrika, 
21, 376. 

Nair, U. S. (1936), “ The standard error of Gini's mean difference," 

Pearson, E. S. (1926), ** A further note on the distrib 
à normal population," Biometrika, 18, 173. 

—— and Adyanthaya, N. K. (1928), ** The distributi 


on of frequency constants in small 
samples from non-normal symmetrical and skew populations," Biometrika, 204, 
356. 


Biometrika, 28, 428, 
ution of range in samples taken from 


== (1932), тре percentage limits of the di 
population,” Biometrika, 24, 404, 

—— and Haines, Joan (1935), “ The use of range in place of standard deviation in small 
samples," Supp. Jour. Roy. Stat. Soc., 2, 83. 

— and Hartley, Н.О. (1942), ** The probability integral of the ran 
Observations from а normal population, 

Pearson, K., and Filon, L. N. G. (189 
and on the influence of ran 
Trans., 191A, 229, 


stribution of Tange in samples from a normal 


£e in samples of n 
' Biometrika, 32, 301. 


8), “ Оп the probable errors of frequency constants 
dom selection on variation and correlation," Phil. 


EXERCISES 229 


Pearson, K., “ On the probable errors of frequency constants," Part I, Biometrika, 1903, 
2, 273; Part II, Biometrika, 1913, 9, 1; Part ПІ, Biometrika, 1920, 113. 

—— (1913), “ Оп the probable error of a coefficient of correlation as found from a four- 
fold table,” Biometrika, 9, 22. ; 

— — (1915), “On the probable error of a coefficient of mean Square contingency," Bio- 
metrika, 10, 590. { 

—— (1831), “ On the standard error of the median to a third approximation," Biometrika, 
23, 361. 

—— and Pearson, M. V. (1931), “ On the mean character and variance of a ranked in- 
dividual and on the mean and variance of the intervals between ranked individuals," 
Biometrika, 23, 364. 

Tippett, L. Н. C. (1925), “ On the extreme individuals and the range of samples taken 
from normal population,” Biometrika, 17, 364. 

Yasukawa, K. (1926), “ On the probable error of the mode of skew frequency distributions,” 
Biometrika, 18, 263. 


EXERCISES 
9.1. Show that the mean value of the variance is given exactly by 


E(m,) =" = 1 


Из 


and that its variance is given exactly by 
SING. pee ES. 
var (m,) = (7 *) EE 2а D a. 


% т ns 


Hence verify that the formulae of this chapter as applied to the variance of a sample 
are accurate to order «^1. 


9.2. In the height distribution of Table 1.7 it has been found that 


My, = 6-616 
ma = — 0:207 
m, = 137-689. 


Regarding the distribution as a random sample from a population which is approximately 
normal, show that m, does not differ significantly from zero (which, of course, must be so 
if the assumption of normality is to be maintained) and that m, has a standard error of 
about 4 per cent. of its value. 


9.3. Verify that the standard error of the first decile in samples from a normal popu- 


lation is Neu 
ут 


9.4. In the distribution of Australian marriages of Table 1.8 it has been found that 
the mean is 29.4 years, the standard deviation 8 years approximately. The median fre- 
quency is about 63,150. Taking this distribution to be a random sample, show that the 
Standard error of the mean is 0-015 years and that of the median 0-043 years. 


230 STANDARD ERRORS 

9.5. If a series of random samples of different sizes is drawn from a population in 
which the proportion of members bearing an attribute A is о, show that the variance of the 
proportions of A in such sets is zl Where H is the harmonie mean of the numbers 
in the samples. 


9.6. Show that the sampling variances of the first four cumulants, as calculated 


from the moments, are given to order n7! by 


1 
var ку = -Kg 
л 
1 A 
var ka = —(k, + 213) 
n 


var кз = “ee T 9к,к, + 9i$ + 6x2) 


1 
var к; E + l6; + 48k. + 34ki + 72к,к® + 144кїк»„ + 24i). 


9.7. ТЕ the variate range is divided into sub-ranges and the frequency of a large 
ватріе falling into the pth range is f,, show that 


соу (fn fa) = — Tofa 


and hence find expressions for the sam. 


pling variance of the rth moment about an arbitrary 
point. 


9.8. Show that in odd samples of n from a rectangular population of unit range 
the sampling variance of the distribution of the median is given exactly by 
4 


1 
(n + 2) 


CHAPTER 10 
EXACT SAMPLING DISTRIBUTIONS 


10.1. The role of the sampling distribution in statistical inference has been indicated 
in Chapter 8. In the present chapter we propose to give an account of the main methods 
of finding such distributions when the population from which the sample was derived is 
specified. It will, as usual, be assumed that the sampling is simple and random. Thus, 
if the parent distribution is d(x) the simultaneous distribution of n values Ede DE ds 
dF(v,)dF(v,)... dF(v,); and if z is a statistic 


B= dedos А c E a + (10.1) 


the distribution function of z is given by 
К) = | T f aren MeN UT A 


the integration being taken over the domain of the z's such that CUTE ole) сд 
Formally, (10.2) is the solution of our problem, which thus reduces to the purely 
mathematical one of evaluating certain multiple integrals or sums. The methods with 
Which we are here concerned are fundamentally devices of various kinds to facilitate the 
integrative process. They may be classified into four groups :— 
(a) straightforward evaluation of the integral (10.2) by ordinary analytical processes 
such as a convenient change of variable ; 
(b) the use of geometrical terminology to effect the same object and to avoid cumbrous 
analytical formulae ; 
(c) the use of characteristic functions; and 
(d) other analytical methods, including mathematical induction. 


10.2. As an illustration of the straightforward analytical approach, let us find the 
distribution of the sums of squares of n independent variables, each of which is distributed 
normally with unit variance and zero mean. The joint distribution of the n variables is 

m 


JE 
then the product of » quantities of type Va 2, that is to say 


dF = 


n 


sap- pa + «$E. шз) ух... EE (10.3) 
(27)? 
We require the sampling distribution of 
а= а +22 0...2. . © ` . e (10.4) 
We have thus to evaluate the multiple integral 
r-[. s f l. exp (— 422%) de, oes dx, 
(270) 


over the domain of as conditioned by (10.4). 
231 


232 EXACT SAMPLING DISTRIBUTIONS 


Make the transformation to variables 2, б, б... gre 


$, = 2 cos 0, cos 0, . . . cos 0 

ж» = 22 cos 0, cos 0, . . . cos 0,» sin Opr i 

%; = 24 cos 0, cos 0, . . . cos Gn; sin 0, 544 
Z, = 2% sin 0, 

The Jacobian of this transformation is given by 


OG ea Ea) 
Oz, 01... 0, 3) 


© 


n— 


Which is equal to i2 2 times the determinant 


Cos б, cos 0, . . . cos 0,4 cos бу cos 0. . . . cos 0—3 sin Bus s sin 0, 

— sin 0, cos 0, . . . cos блу  — sin 0,0080, . . . cos Вав N cos 0, 
— cos 0, sin 0, . . . cos 0,1 —созб,вїпб,... c030, 9 sin бу... 0 
‘| — cos 0, cos 0, . . . sin 0.  4-c0s0, cos 0, . . . cos (E E peg 0 


Taking out common factors in columns we find that this determinant is equal to 
созт 0, сов? 2 9, . . . cos 0, Sin 0, sin 0, . . . sin 6,1 times 


1 1 1 1. 1 
— tan б, — tan б, — tan б, o s cot 0, 
— tan 0, — tan 0, — tan 0, nda 0 
— tan 0, , — tan 0, cot 0, , 0 0 
— tan 0, 4 cot 0, , 0 0 0 


and, on subtracting each column from the preceding one, the determinant is found to reduce 
to cos"-? 6, cos^-3 9, . . соз б. | 
Thus our integral becomes 


n—2 
f T Í Ie ocio o б... созб s ded o o o dopa. (10.6) 
(2л)? 


The advantage of the transformation is that the limits of th 
simpler. z itself can vary from 0 to z and the бв from 


(10.6) divides into a product of integrals, those in 0 being constant, 
tribution function of z 


THE ANALYTICAL METHOD 233 


Hence the distribution sought is 


a Pearson Type III curve. 


10.3. The essential feature of the change of variables is the simplification of the 
domain of integration as defined by the limits of the new variables. In general, we usually 
take the statistic whose sampling distribution is being sought to form one of the new variables 
and choose n — 1 others in any way which may be convenient to the particular problem. 
Then, if J is the Jacobian of the transformation, namely 


= о) 
е az, (eee 0 1)" 


the integral (10.2) becomes 
Bel eae [л Ей ы кше aa o C0 
Ola 05 э =» 


On— 1) 
J(%;) being the frequency function of thé parent апа 2; being expressed in terms of z and 
the 0's. The integration now takes place with respect to the 0's, which can usually be 
chosen so as to vary between limits which are independent of z; and thus the indefinite 
integral (10.2) is replaced by more easily caleulable definite integrals. 

As always in such cases J is subject to an ambiguity of sign which must be determined 
so as to make the transformed integral positive. The validity of the variate-transformation 
depends on the familiar conditions governing the change of variable in a multiple integral. 
For example, it is a sufficient condition that the new variables and their first derivatives 
shall be continuous in the 2’s and that J does not change sign in the domain of integration.* 
Some further examples will make the general type of investigation clear. 


Example 10.1 
To find the distribution of the mean of a sample of n values x, . . . x, from the dis- 
tribution 
аР = E 7 — o «t « оо 
2 a(l +a) . ni. 
The joint distribution is 


dier а 


and the statistic z is given by 


ng = У". . . . . + (10.11) 
7=ї 
We have to integrate (10.10) over a domain of a’s subject to Sx «mz. Let us take new 
variables ©, = Xi, Ta = Ua... X424 = X451 and 


Bp = NZ — By — Xe — o o o Uy}. 


È * See, for example, de la Vallée Poussin, Cours d'analyse infinitésimal, 1926, vol. 1, para. 285; vol. 2, 
ara. 18. 


234 EXACT SAMPLING DISTRIBUTIONS 


Here J is evidently equal to the constant n. Our new variables z, . . . z, тау extend 
from — co to + oo and the new variable z from — co to z. We then have 


: gan: у а 10.12 
еи P 
and the frequency function of z is given by the (n — 1)-fold multiple integral іп v, г. . 2,1 
in (10.12). This integral may be evaluated by step-by-step integration. We have 
1 1 
Саи (а 32) «r0 13900 + (© — 1)°] 
[== а? +72 — 1 2a? — 2ax =). 


z?--1 z?--1 т + (а —2)? re? + (а —a)* 
Whence, integrating with respect to z from — со to + co, we find on the right 
1 2 
а log (2° + 1) — a log (r? + (а — x)*} 
(re Dye +er— 1)#}| 198 


oo 


$ . 9 Ne 
+ (a? +r? — 1) tan! x +% : Tl anaE =] 


—% 


E r+l 1 
reducing to a( - aren Э . . б . (10.13) 


Thus in (10.12), taking x —z, ,r—1,20—mz—az,—... 
(n — 1)-fold integral reduces to 


[иы RIDE LT : 
= ШИ E. (hbo) Chee te Sn 
Integrating with respect to z, 5, 2,1 . . . successively, we reduce this eventually to 
; n? 1 h 
= : . . . 3 14 
z(n*-F(nz?) a(l +2?) goai 
Thus the distribution of z is given by 
dz 
== = < . E . . : 
ATES co <z < co (10.18) 
and is thus the same as that of a single observation. 


This is an interesting example of the failure of the Central Limit "Theorem, the mean 


of samples of n failing to tend to normality for large n. The second moment of the distribu- 
tion does not exist. к 


— $4.9, we find that the 


dF 


Example 10.2 


To find the distribution of a linear function of n independent variables Di 
where х; is distributed normally with zero mean and variance Vj. 
Let the linear function be 


‚ Ф, 


2 = а; H e.n. JU S. n 


Then by a transformation & = Ži we have 
vv 
Rx AXI tup ILL «75 x 1$ (10,17) 
and £; is now distributed with zero mean and unit variance. Our problem is t} i 
to finding the distribution of a linear function of variables each of which E ARA 
tributed with zero mean and unit variance. Pon 


E Ж Чы 


THE ANALYTICAL METHOD 


Consider a transformation of type 
б, =з lé, ap 1.5. EMI ir IL. 
fy = тё, + me, +... m, 


Cn = pi + PR... + pu 
and let us determine the ls . . . p's such that 


12 +m? +... р = 1, all j 
Ul, + туту, +. . . рр = 0, 


allj, k, ja : 


235 


. (10.18) 


. (10.19) 


This can always be done, for the conditions impose only n + 3n(n — 1) conditions on the 


n? constants. 
We have then 


ae == (Lé, ЕЕ. liga)? a OS Se (pı ae 55 EDER) = 275° 


j-1 


1-1 


in virtue of (10.19). The joint distribution of the é’s is by hypothesis 


p exp (— 3Z£?)II dé 


n 


(2л)? 
—-l. ехр(— Q3) II dt 
(2л) 
where 
Ec 
J= al Tay 
дё 
The determinant 7 is then, from (10.18), 
L h l 
Mı Ma My 
Pı P: Pn 
and multiplying this by the equal determinant 
hy my S&S. Spy 


ig” EU rcs VIR 


ln My +++ Dn 


we find, in virtue of (10.19), that the product is 


РОЖЕ as 0 
Оа 0 =) 
ОО Dl 


1 
Thus у = + 1 and (10.20) becomes 


— exp (— фп. 
(2л)? 


. (10.20) 


. (10.21) 


236 EXACT SAMPLING DISTRIBUTIONS 


Now the #5 may vary from — оо to co, and if we require the distribution of one of the 
Св, say б, (=é +... LE), we have to integrate over all values of £ such that 
Zl; <¢,. This is equivalent to a range of £, from — co to ¢, and of the other ¢’s from 
— оо іо + со. Thus the integral of (10.21) becomes the product of (n — 1) definite integrals 


each equal to f e`? dt = (2x) and the integral |. £3" dt, and hence reduces to 


FO = gaf eva. i'd wo sx [REA 


In other words, ¢ is distributed normally with unit variance and zero mean. Ё is an 
arbitrary linear function 27,&, subject to the condition that XL? = 1. Referring to (10.17) 
we see that the slightly more general linear function z — аул, = Хау, will be distributed 


E, H 
j Vja Fi has coefficients 


: : Ea; v; 

normally about zero mean with variance Xav, for then [eS 
a; *v, 
mx Je 


ауу. М ae 2 
„(= pu) obeying the condition 27,2 = 1 and is distributed with unit variance. 


The Geometrical Method 


10.4. А considerable amount of cumbrous analysis may usually be avoided by the 
use of geometrical representation of the domain of integration. We may imagine the values 
у... ,attaching to any given sample as the co-ordinates of a point in an n-dimensional 
Euclidean hyperspace. The function dF(v,) . . . dF(r,) may then be regarded as the 
density at the point and the total frequency between 2; and 2, will be the integral of this 
density (the weight) in a region lying between the two loci 2(x,...2,) = z, and 
Z(v, ... z,) = 2, which in general will be hypersurfaces in the n-fold space, i.e. will 
themselves be spaces of (n — 1) dimensions. The distribution function of z will be the 
total weight between the hypersurface corresponding to z = — oo and that corresponding 
toz; and the frequency function will be the element of weight between the hypersurfaces 
z — 4dz and z+ 4dz. 


Example 10.3 


Consider again the problem of Example 10.2. In the n-fold &-врасе the density is 
given by 


1 
z exp (— 12°). 


(2л)? 


The statistic z (= Хаз) determines a hyperplane 


2= Хау. 0. 7. у ч 
and we have to find the total weight between this hyperplane and the corres 


plane at — oo, і.е. the weight on one side—the “ lower ” side—of the h 


Й erplane (10.23). 
Now Zé? is the square of the distance of the point £, ee 


+ ++ & from the origin and i 
therefore unchanged by any rotation of the co-ordinate axes. ‘Choose such Eme 


which brings the axis of one variable perpendicular to the hy ООШ 
y perplane (10.2 
in Q. Let P be the sample point &,... £, and O the origin. Th xxr 


Хғ = OP? = 0@ + QP, 


ponding hyper- . 


QD. О ЫШ _ 


THE GEOMETRICAL METHOD _ 2337 


so that the density at P is 


1 (400-40, 
n 
(2л)# 
For variation over the hyperplane OQ? is constant and the integral of &-39P' is thus a 
constant independent of OQ. Hence the frequency function of z is given by 
fle) = 190, 
k being some constant. 
But OQ is the distance from O to the hyperplane and is given by 
2° 


au ғ 
09 ауу 


Непсе i 
коп ep 
f(z) = k exp { уат |. 


i.e. z is distributed normally with variance Ха;?р; about zero mean. 
. The reader will find it instructive to compare this example with the previous one. 


They are, in effect, the same thing expressed in different language. 


Example 10.4 

Consider again the illustration of 10.2. Тһе elegance of the geometrical approach is 
well brought out by the analogous derivation of the result there obtained. 

In fact, our density function, as before, is given by 

Ire ior", 

We require the distribution of the statistic z = OP*, and the density is obviously constant 
over the surface z = constant, that is to say the (n — 1)-dimensional hypersphere. The 
frequency function of z is then the integral of this constant density between the hyperspheres 
z and z + dz, i.e. is proportional to e~?°”* times the element of the volume of the hyper- 
Sphere, which itself is proportional to the nth power of the radius OP. Thus we have 


dF = keto" рп dz 
=: вй ghia” dz, 
giving, on evaluation of the constant, 


аР = emt 409—9 dz 


n 


as before. 
Now suppose that the quantities z, . . . %,, while still being normally distributed with 


unit variance, are subject to p linear restrictions of type 
G49; + бай +... 0%, mb. 


In the n-space the variables will then be constrained to lie on p hyperplanes. The first 
will cut the hypersphere of constant density in a hypersphere of one lower dimension, also, 
of course, of constant density ; the second will cut this in a hypersphere of one lower 
dimension still, and so on. The result of the linear restrictions will be to constrain the 


238 EXACT SAMPLING DISTRIBUTIONS . 


variables to'a hypersphere of p lower dimensions, and thus the distribution of z in these 
circumstances will be as before, but with n — р instead of л, ie. 


dF = 


emie gin-»-2) dz 6 5 ‚ (10.24) 


1 
окар) р ( ==) 


Example 10.5. The sampling distribution of the mean and variance in normal samples 
Writing Z for the mean of a sample, we have, for the variance 83, 


-In samples from a normal population with zero mean and unit variance the density at the 
point z, .. . z, is proportional to 


exp (— 42x?) = exp (— 3(ns? + n&*)}. . o + (10.25) 
Let us find the sampling distributions of s and 2. From (10.25) it is seen that the density 


function can be expressed simply in terms of those quantities, and we then have to find some 
transformation of the volume element doy... dm. 


: > S e М ПТ 1 
In the n-space consider the unit vector whose direction cosines are Pra олы у 


Мз 


say OQ where О is the origin. If P is the sample point, let PM be the perpendicular from 


P on to OQ. Then the length of OM is 


a X. 2, E 
oe E 


М» n ут 
The length of OP is V/Xz?. Thus the length of PM is (Zx? — ng2) 
The element of volume at P may be reg 


lar hyperplane through M. 


t example, are hyperspheres 
element of volume is equal to £ dà s"—? ds 


oncern us since they are independent of 


dF œ exp (— (ns? + ni?)ps"-? dz ds . 


(10.26) 
. and this splits into two factors 
: dF oc enë qz о : j 5 10.27 
dF œ eisg- ds, d ; (ш 
T mune from a normal population the distributions of mean and variance are 
independent. Equation ( 10.27) is equivalent to the result found in E 
Equation (10.28) is new. We have е еа 
dF сс g-insgn-3 qoa 
and, on evaluation of the constant, 
n—1 
dF = WS em tns*gn—3 qoo 
zn > 3 (s?), OES «o о, + (10.29) 


=~ 


THE GEOMETRICAL METHOD 239 


It is interesting to compare this with the distribution of the previous example. In the 
latter case we found the distribution of the sum of squares of the variables measured from 


a fixed point. In this case we have found the distribution of 1 of the sum of ће squares 


measured from the sample mean. A comparison of the form (10.29) with that of (10.24) 
shows that the distribution of variances is, except for constants, the same as that of sums 
of squares when subject to one linear constraint. 


Езатр'е 10.6. © Students" distribution 
In the previous example we have 
£n OM 
А c es Ob 
Fee pa oun 


where ¢ is the angle POM. 

If, then, we define a statistic z — a z will be constant over the cone obtained by 

s 

rotating PO about the unit vector, keeping the angle ¢ constant. The distribution of z will 
then be given by determining the weight between the cones defined by ф and ¢ + dẹ. 

Consider the intersection of these cones with the hypersphere of radius OP. They 
will cut off an annulus on the sphere whose content" (the n-dimensional analogue of 
volume) will be proportional to OP d$. PAI"-? 

= OP"-! sin"-?9 dd. 


The density function is constant and proportional to e739?* on the hypersphere and thus 
the total frequency between the cones will be proportional to 


| «-10Р°Орп-1 gin"24 dë d(OP) 
0 


cc sin"-?4 dd, 0<¢ «n. 

The distribution of 2 (= cot 4) is then given by 
ате k dz М 
(1 + 2°) 


or, on evaluation of the constant, 
dF = T ; : . (10.30) 


Since z is the ratio of two functions of the variables of unit dimension this distribution 
holds for samples from a normal population irrespective of the scale, that is to say, irrespec- 
tive of the variance of the parent population. 

The distribution is usually put in a slightly different form. 


Put , t= myn a= y(n — 1). 


f ds Ze- a 


240 EXACT SAMPLING DISTRIBUTIONS 
(10.30) then becomes 
dF = : => :: tos eee 
vin — 1)B( =, 1) ( + 227) 


ytd 
ч) ш 


= xix x —x* A 
ova Ce 


where у =n — 1. 


This celebrated expression is known as “ Student's " distribution after the nom de plume 
of its discoverer (1908).* The distribution function may be evaluated from the incomplete 


B-function, but special tables have been prepared. One such, due to “ Student ” himself, 
is given as Appendix Table 3. 


Example 10.7. Distribution of the mean of samples from a rectangular population 
Consider now a sample of n values from the rectangular distribution 


dF — dx 0 <= <1. 
In the n-space the density function will be a constant everywhere inside a hypercube 
D t MES ue ro a йоз 


and zero elsewhere. "The unit vector will be along the diagonal of this cube. If P is the 
sample point (a, . . . z,) and PM the perpendicular on to this diagonal, then, as shown 
in Example 10.5, OM — хуп. Thus, for the distribution of we require the element of 
weight (which in this case is proportional to the element of volume) between the hyperplanes 
€ and & + dz ; and this is equivalent to finding the content of the hyperplane (its “ area ”) 
cut off by the various faces of the hypercube. The complication of the problem arises from 
the fact that as increases this region changes its shape according to the number of edges 
of the hypercube cut by the hyperplane. 
Consider the “ quadrants ” 


®; > 7 2 ; 4 
ЕЯ" PE boe tr : . + (10.34) 


whose corners are the corners of the hypercube. Any one of the corners may have 0 or 
lor2... orn ofits co-ordinates equal to unity and the rest zero. We divide the quadrants 


into (n + 1) sets according as the corner has 0, 1, . . . » of its co-ordinates equal to unity, 
that is, according as 
n 
m 
st 


is equal to 0, 1, ...m. A quadrant of the tth set may be called Q, 


There will be 
different Q/'s. 


Let S be any point of Q,, i.e. any point whose co-ordinates are all > 0, 
* Strictly speaking, * Student’s " distribution is that of (10.30), 


the modified form (10.3 i 
due to R. A. Fisher. The latter form is therefore sometimes referred to as Fisher’s о 


Ne 


THE GEOMETRICAL METHOD 241 


and let just s of its co-ordinates be > 1. Then 5 will belong to just || ESI (Sess ; 


(2)ез and so on. Now if s> 0, 


Eye »() =(1—1)=0. T X SERT NEP TUTO) 
t=0 


Hence, if whenever a point belongs to a Q; we give it a density (— 1)! and then sum over 
all Q, the resultant density will be 1 or 0 according as the point belongs to the hypercube 
or not. 
Let the segment of the hyperplane 
ge. с < c 5 . (10.36) 


lying in Q, have content V,(z) Then the segment lying in any member of (10.34) will 
have content V,(z — r) which is zero if r >z. Further, the segment of (10.36) lying in 
any member of (10.34) will have the content à 

k 
aI 1y(") DC ee ise . 5 @шў 
r=0 
where Ё = [z], = the greatest integer less than 2. 

To find V,,(z), let V, .,(z) be the projection of V,,(z) perpendicular to one of the axes, 

so that 

Val) = уп V, (2). 
Now V, (z) is the content of the n-dimensional region bounded by (10.36) and the co-ordinate 
hyperplanes—a region whose base is therefore of content V,(z). The perpendicular from 


О to this base is ——. Hence 


Mn 
Pate) = (Ano 
| and Pratt) = Ld ot) 
бс V.) = == = a КОК s 0:38) 
P Since V,(z) = z4/2 repeated applications of this formula give 
— * Y) = mont 


Substituting in (10.37) we find for the content of the region common to the hypercube 
and the hyperplane 


| ут ч “ЛҮ кы 
"o @ dy! (Ne (pent s ; . (10.39) 


for values of z between k and k + 1. 
Since 


li А.З. —VOL. I, 


242 EXACT SAMPLING DISTRIBUTIONS 


the distribution of the mean m = = is given by 


fm) = NC 2C - = ее pd 


т=0 


This is the required distribution. It is unusual in consisting of n arcs of degree (n — 1) 
in m, having (n — 1)-point contact at their joins, that is at the points s (eas e 20) = 


The distribution is symmetrical since the hyperplane z = constant is perpendicular to 
the long diagonal, which itself is an axis of symmetry of the hypercube. 


For particular values n = 2, 3, 4, (10.40) gives the following results for the frequency 
function :— 5 


TEDE 4m, 0<m <} 
4(1 — m), 4 <т<1 
27m2 

n=3: — 0 «m «i 
PE а A 
(v n — 3(m — 4)*}, 4<m <$ 
27 2 
zU — т) $ «m «1 

n = 4: =, 0<m <i 
128 з 1)3 2 1 
- n — 4(m — 1)?3, 4 <m <} 
128 
ES =m)? —4@ —m)}, 8 <т < 
=m), 2 «m <1, 


Tf the frequency curve be drawn it will be found to resemble a normal curve in appear- 
ance. The distribution, of course, tends to normality as n increases in virtue of the Central 
Limit Theorem. 


The Method of Characteristic Functions 


10.5. It has already been noted that the characteristic function of the sum of 
^ independent variables is the product of their characteristic functions. This simple 
property enables us to find the sampling distribution of a wide class of statistics which 
are expressible as sums, and particularly of the mean. 

If we have a sample of n values from a population whose characteristic 
the characteristic function of their sum is ф". 
sum z is given by F(z) where 


1f? 1 — e-it 
F(z) — F(0) =: zl п ла 


—% 


function is (0), 
Thus the distribution function of nx 


Б a) oe . (10.41) 
and the frequency function is 


T (em 
fe) = z| eta. х (1022 


THE METHOD OF CHARACTERISTIC FUNCTIONS 243 
The following examples will illustrate ‘the power of these results. 4 


Example 10.8. Distribution of the Mean for the Binomial 
The characteristic function of the binomial (q + р)" is 
(q + рей). 
The c.f. of the sampling distribution of the sum of n values is then 
(9 + рен)" 
1 5 
and that of the distribution of the mean (5 of that sum) is 
itr. 
(q+ pen) . 
But this is the c.f. of the binomial ғ 
(9 2) . f : 5 ; . (10.43) 


the interval being 1 instead of unity; and hence this distribution is that of the mean. 


Example 10.9. Distribution of the Mean for the Poisson Distribution 
The characteristic function of the Poisson distribution whose general term is Ss is 
exp {A(e — 1)}. 
The c.f. of the mean is then 
exp niler — 1) 
and hence the distribution of the mean is the Poisson distribution, whose general term is 


r 
ewe = en OTS s re (ТО) 
the interval being Р instead of unity. 
Example 10.10. Distribution of the Mean for the Normal Population 


The characteristic function of the normal distribution 


1 ccu 


Fudge equ 
V (л) 
is exp (— 110° + itu}. 


The c.f. of the distribution of the mean of » values is then 


2623 ji 263 
exp nf ee +2 exp { е + ita . а e (10.45) 


T, 


Ж w— ae " е c? PENES 
This is the c.f. of a normal distribution with mean u апа variance P which is therefore 


the distribution required. 


244 EXACT SAMPLING DISTRIBUTIONS 


Example 10.11. Distribution of the Mean for the Type III Population 
The characteristic function of the distribution 


1 -zfaNr-!dx \ 
a, T (©) a а> 0 
: 1 
is (Gay 
The c.f. of the distribution of the mean of n values is then 


This is the c.f. of the distribution 


l  -£z/mxN""-1q dx 
= сч . . А 4 
КС ИМЕ ш 


Example 10.12. Distribution of the Mean for the Rectangular Population 
The characteristic function of the distribution dF = dx is 
1 сна 
f dnde = € 1 
0 it 
it 


TERN 
The c.f. of the mean of n values is then 7 y and the frequency function is thus 
D 


” n 


it 
hype VS n 
fe -z| GP RESET vq v" Ou (TOT 
271) 10 it ( ) 


n 


This integral is everywhere holomorphic and the range of integration may then be 
changed to the contour Г consisting of the real axis from — оо to — c, the small semicircle 
of radius c and centre at the origin, and the real axis from c to oo. Thus 

it 


fi) = zl esit (y "m 


n 


= Cx en ў (= "Oy di. . " . (10.48) 


n 
[ dz = 0 if g>0 
r 


z" 


-1 
our dus 


(n — 1)! 


if g <0. 


This may be seen by integrating alon 


Ў g а contour consisti ; В А 
circle above the real axis if g > 0 and below it if g« 0. QE Of I and His iifinitei semi- 


3 


THE METHOD OF CHARACTERISTIC FUNCTIONS 245 
Substituting in (10.48) we find 


n ат 
ste) = | (т) a 


гл 


DT n-1 
еф 
rge 


ўт 


This, with a few changes of notation, is the same as (10.40). 


10.6. General expressions may also be derived for the distributions of geometric 
means and the moments about fixed points. 
In fact, if y = log v, the characteristic function of y is 


lij s f it nz qp =| ait dP, 
The distribution of the sum of x independent values of y, say nz, is then given Љу 
= iins 
F(nz) — F(0) = xl. ==, 


and the distribution of the mean is that of z. But z = log u, where u is the geometrio 
mean, and hence the distribution of u may be found. 
The frequency function, when it exists, is 


1 © 
E- Aint пг yn qt. 
J(nz) zl. е = 
Similarly the characteristic function of a power of the variate, say a", is given by 
Blt) zii et! qp 
and thus the distribution of the rth moment, say z, by 


F(nz) — F(0) = | Ss dt. PEE 


^ dt 3 9 009) 


Example 10.13. Distribution of the Geometric Mean in Samples from a Rectangular Population 
If the population is 
аР =} de 0 <= <a, 


the characteristic function of log 2 is 


а ай 
o; а lc 


246 EXACT SAMPLING DISTRIBUTIONS 


The frequency function of u = 2 log x is then given by 
lf? e-itgnit Я 
S (1 + it)” 
17 ей (п loga—u) ү 
-z. (EP 
- This integral may be evaluated in the manner of Example 10.12 and we find 
sm (n log VS щ)"—1 e- o loga—u) 
Ји) = ~~ Tu 7 


u 
whence, putting z = e», we find for the distribution of the geometric mean z 
"а" —1 1 а\%®-1 10 51) 
(2 К жулт g- LI + LI LI LI . 
fe rk SE =) ( 


Example 10.14. Distribution of the Second-order Moment about the Population Mean in 
Samples from a Normal Population 


If the distribution is 


2л. 


n loga — u > 0. 


ar a e 
= ——_—__@ 20 dz 
ov (27) 
tne characteristic function of x? is 
1 


oy (27) 


© 25753 
f eit? e 28 dx 
E" 
= 1 
i (1 = 3e?it) 
The c.f. of the mean of n values, say ms, is then 


qu MATES cov. ов? 


dt. 


"n 


0.52) is the characteristic function of the distribution 


n 
nz on in- 

тру" РЕ NU. (10.53) 
AG 


a result which may be compared with that of Example (10.4) 
The Method of Induction 


10.7. The distribution of the sum 
directly without the intervention of ch 
the distribution functions, the distributi 


› to which it is equivalent. 


of two independent variate 
aracteristic functions. 
ion function of z = a 


F-[ [aam 


—® 


в may be’ obtained 
If Р.) and F,(a,) are 
+ t is given by 


0а) 


of the previous example, or the result written down\, 


A 


a 


a 


THE METHOD OF INDUCTION 247 


the domain of integration being that for which z, + z, <2 
“flee 
=| Wesce ТОЗ) 
If, further, F is differentiable, the frequency function of z is given by 
fe) =| Ле — 232) Љ ахь, МЕУРИ) 


Л and f, being the frequency functions of x, and 2. 

(10.56) can be used to obtain successively the distribution of the sum of any number 
of variables whose individual distributions are known. If all the variables have the same 
distribution the general form may be suggested when the results for two or three variates 
have been worked out. Its correctness can then be verified by induction. The following 


examples illustrate the method. 


Example 10.15 
Consider again the distribution 
dx 
аР = —_—__. 
a(l + 2°) 
Ву (10.56) the distribution of the sum of two independent variables each of which has this 
distribution has the frequency function 


f. F = e 5) т. z asy 


This suggests the general form 


— о «c « oo. 


n 
a(z? +n?) 
If this is correct, then the form for (n + 1) variables is 


ib s ЕЕ 5 = aij iu) da 
=" + )/e + (n + 1)2) 


л 
ad (n + 1) 
л{#° + (n + 1)*} 


The result holds for n = 1, 2, and is therefore true in general. 


Example 10.16 
In Example 10.4 we found that the distribution of the sums of squares of n independent 


normal variates is given by 
dF 


-et 00-9) dz, s s - (10.57) 


A. 1 
этү" 


` Suppose we had surmised this form from an examination of a few cases for low n. Let 


x be another variate distributed normally about zero mean with unit variance. We require 
the distribution of z + 22. 


248 EXACT SAMPLING. DISTRIBUTIONS 


Let z? — v. 
— y-i dv. 


1 
dF = ara’ 


Then, from (10.56) the frequency function of the distribution of u = 2 + v is given by 
1 е zim-2 dz 


ei" 
f (u — 2)-%+#-® dz 
fra" 


©з гг(® 
- _ Е). їп) e~it ціп) 
n 
zz rn £n) 


eg yi(n—1) 


= чысынын 
FEE 


which is the same as (10.57) with n + 1 for n. 


Hence the distribution holds generally. 


The Distribution of a Ratio 
10.8. Cases not infrequently arise in which we wish to find the sampling distribution 
The problem becomes somewhat compli- 


of the ratio of two independent statistics, z,, Za. 
cated when the divisor 2, may be negative, but relatively simple in the contrary case 


If F,, Р, are the distribution functions of z, and z, and v = 2, then for the distribution 
2 


function of v we have 
° UZ, 
i i Í dF, dF, 


= | воа, E . (10.88) 


or, in terms of frequency functions, 


dO Wu Zofi(V22) f (2) dza. 5 Я . (10.59) 


Example 10.17 
Consider again the distribution of the ratio &/s discussed in Example 10.6. Here z is 


the mean of samples of n from a normal population and is thus distributed as 


dP c eu di, 


8 is distributed as 
an 
dF осе 29 "2 de, 


as we have found in equation (10.29), 


Then v has the distribution 


AM 


THE DISTRIBUTION OF A RATIO 249 


Then the distribution of v =~ is, from (10.59), a constant times 
8 


oo nsv? пз? 
| s.e 39 6 ©з" "ds СС 1 = 
3 (1 + 02) 
which then gives us the distribution (10.30) on the evaluation of the constant. 
хатріе 10.18. Fisher’s z-distribution 


Suppose we have two independent samples of n, апа n, members respectively from 
normal populations with variances oj and cj. The distributions of the sample variances 


s? and sj (- 15е — әз) are then 


ви 

dF осе 26,7 s, 3-7? ds, 0 «cs < о 
ET 

dF «x e ?0 s? ds, 0 «5, < oo 


The distribution of the ratio t ==! is then, from (10.59), given by 
E А т1125 E т.352 9 
t 2exp | — 3 (4t? exp | — =E |8," ds 
1 = [eso (— S ert exp (— 3 nt de 


© $ 
5 Nnt? n 

a exp 4( — z— — — = |82 54:073 (17? ds, 
0 20? 208) ~ 


1-2 
or , “Оу . . . . (10.60 
P e ug бик ( ) 
сін) 
О о> 
This is usually expressed іп a somewhat different form. Put 
а 818 (m2 — 1)8 __ (RN, — 1) 


(їл — 1)8$ in nani — 1) р 
We find for the frequency function of z 
em -12 
/@) © (E =e me 3 wel 


z c] 
gi 05 


— o <2 < о 


ог, writing », =, — 1 and », = т, — 1, and evaluating the constant term 
ty ga? рї үй nz 
а а. $ . (10.61) 


fe) = p". ^ "fne n? Tex „9 
2'9 AG с? 


In particular, if o? = 02 we get Fisher’s z-distribution of half the logarithm of the ratio 


of two variances from a normal population 


дүз yd^ ene 

= * * . . . (10.62 

J gin IA O + ао ( ) 
2'2 


The distribution function of z may be obtained from tables of the incomplete B-function. 
Special tables showing, for various values of v, and v, the values of 2 corresponding to 
F(z) = 0.99 and 0.95, have been prepared and are given as Appendix Tables 4 and 5. 


250 | EXACT SAMPLING DISTRIBUTIONS 


10.9. Up to this point we have been mainly concerned with the distribution of а 
single statistic compiled from the members of a sample which is random and simple. The 
methods may, however, readily be generalized to obtain the simultaneous distribution of 


several statistics. For example, if there are several statistics z,, 2» . . . Zp, and the joint 
distribution of the sample values z, . . . z, is represented by dF(zi, ... ta) the character- 
istic function of the z’s is given by 
ЖООК E) -f Jf | ELEM eripe) df(m iem) (10588) 
—©ю —®= 


and the frequency function of the z's (if it exists) by 


fle, ©. 2%) = zb 42 [n exp (— itz: ... — #2)... 6) 


a Ub v .d6 


. (10.64) 
p 
Examples of the use of these results will occur in the sequel. 


NOTES AND REFERENCES 


The geometrical method is largely due to R. A. Fisher, whose use of it to derive the 
sampling distribution of the correlation coefficient (1915) is a beautiful example of tho 
power of the method (cf. Chapter 14). See also Uspensky (1937). 

А Some of the distributions derived in the foregoing examples are classical. For 

* Student's " distribution see his paper of 1908 and Fisher's paper of 1925. The distribu- 
tion of the sums of squares of values from a normal population was discovered by Helmert 
in 1876 but forgotten until Karl Pearson rediscovered it in 1900. The distribution of the 
mean of samples from a rectangular population is traceable as far back as Lagrange (Miscel- 
lanea, Taurinensia, 1770-73), but was forgotten and rediscovered simultaneously by Hall 
and Irwin (1927), the former using the geometrical method and the latter characteristic 
functions. For the distribution of means from Pearson curves, see Irwin (1930). For 
- Fisher’s z distribution, see his paper of 1915 and that of 1924. For the distribution of 
a ratio, see Cramér (1937) (Exercises 10.8-10.11 below), Geary (1930), Fieller (1932), 
and Nicholson (1941). The distribution of the ratio of two normal variables exhibits some 
unusual features; it may, for example, be bimodal. 


Cramér, H. (1937), Random Variables and. Probability Distributions, Cambridge University | 


Press. 


Fieller, E. C. (1932), * The distribution of an index in a normal bivaria: 
Biometrika, 24, 428. 

Fisher, В. A. (1915), “ The frequency distribution of the values of the correlation coefficient 
in samples from an indefinitely large population," Biometrika, 10, 507. 

—— (1924), “ On a distribution yielding the error functions of several well-known statistics," 
Proc. International Math. Congress at Toronto, 805. 

(1925), “ Applications of ‘ Student’s’ distribution,” Metron, 5, No. 3, 90. 


Geary, В. C. (1930), “ The frequency distribution of the quotient of two normal variables,” 
Jour. Roy. Statist. Soc., 93, 442. 


Hall, P. (1927), “ The distribution of means for sam 
in which the variate takes values between 
probable,” Biometrika, 19, 240. 


te population,” 


ples of size N drawn from a population 
0 and 1, all such values being equally 


РА 


У 


EXERCISES ар 251 


Irwin, J. O. (1927), “ On the frequency-distribution of the means of samples from a popula- 
tion having any law of frequency with finite moments,” Biometrika, 19, 225, 
and (1929), 21, 431. 

——— (1930), “ On the frequency-distribution of the means of samples from populations 
of certain of Pearson's types," Metron, 7, No. 4, 51. 

Kullback, S. (1934), “ An application of characteristic functions to the distribution problem 
of statistics,” Ann. Math. Statist., 5, 263. 

——— (1935), * On samples from a multivariate normal population," Ann. Math. Statist., 
6, 203. 

Nicholson, C. (1941), “ A geometrical analysis of the frequency distribution of the ratio 
between two variables," Biometrika, 32, 16. 

Pearson, Karl (1900), ** On the criterion that a given system of deviations from the probable 

. is such that it can be reasonably supposed to have arisen from random 

sampling," PAil. Mag., 50, 157. 

* Student" (1908), “ Тһе probable error of a mean," Biometrika, 6, 1. 

Uspensky, J. V. (1937), Introduction to Mathematical Probability, McGraw-Hill, New York 
and London. 


EXERCISES 


10.1. Derive by the method of characteristic functions the expression for the sampling 
distribution of the mean of samples from the population 
"uer Ме. 
а + x2) оао. 
10.2. Show that the distribution of the geometric mean g in samples of » from the . 


Type III population 


ағ Rata d 0 « 
= ж «v < с 
Гр) 
is = mgnp-l Y yeni ап-1 grt 
Г) Гр)" йт" I(t + 1) Ji; 


(Kullback, 1934.) 
10.3. Show that the difference of two values drawn at random from the Poisson 
population whose general term is p is distributed in the form whose general term is 
e7” (24), where d can take all integral values from — oo to oo and T'(22) is Bessel's 
modified function of the first kind of order d and argument 24. (Cf. Example 4.5.) 
(Irwin, 1937, Jour. Roy. Statist. Soc., 100, 415.) 


10.4. Show that the distribution of the mean of samples of n from the Type II 


population 
dF oc a?-!(1 — zy? da p>0,0 <a <1 


is given by 


p т 
б n n Tp) о 0) S 
Ў) = зл? TO NS cos (n$) df, 
where J,(z) is the Bessel coefficient of order 7 in 2, 
(Irwin, 1927.) 


252 EXACT SAMPLING DISTRIBUTIONS 


10.5. Show that the distribution of the geometric mean of n variables, one from each 
of the populations with frequency functions 


ME pii 
AERE EA aP' n е-= 


Гр) ° р n río ^—1 
Аш % 


is the same as the distribution of the arithmetic mean of n independent variables distributed 
in the first of these forms. 


(Kullback, 1934.) 


10.6. Show that the difference of two independent variates, z, each of which is dis- ' 
tributed in the Type III form 


—ryp-l 


To) — dx 


© етігі d 
o Га Frp” 


2p—1 


E 
Г) = m Kn, 
22 Г(р)Г(%) | r 
h x) i 1 function of second order and imaginary argument. 
ln ver eno Stouffer and David, 1932, Biometrika, 24, 293.) 


ағ =° 


has the frequency function 


10.7. Ifa frequency function is given as the sum of a number of terms of the Type A 


series А , 
d. a жү [E 
Ја) = «e = aun) +... + em) 
show that the sum S of n independent variates has a frequency function 
M Авт» (09 dms 
709) = «s + su) me. s sus x (5 
where X = oyn and 


n! 
А, = 2 аз" 
4 vau. s. HM ва .„. — 9). ' 


the summation being taken over all values of the »’s for which 
3, + 4, +. - thy, =j. 
(Baker, 1930, Ann. Math. Statist., 1, 199.) 


баг», 


10.8. A theorem of Cramér’s (1937) states that if two independent variables, v, and 
Za, with finite mean values, distribution functions F, and F, and characteristic functions 


$1, фа are such that F,(0) = 0, so that x, is non-negative, and f m di converges, then 
1 


the distribution function of v -2 is given by 
2 


ro e if HO hd д 


2л) 2 it 


t y 


EXERCISES 3 : 253 


and the trequency function, if it exists, by 
ре , 
Hoo) = x ees to) dt. 


25i, 


Use this result to obtain the distributions of Examples 10.17 and 10.18. 


10.9. Show that the ratio of two independent normal variables has frequency function 
I) засын E qm RO | 
Vn) (оў + 003): LESE 
where o, с; are the mean and standard deviation of the first variate, "s, c, those of the 


second variate, and it is assumed that m, is so large compared with c, that the range of 
the second variate is effectively positive. 


Hence show that (CES is normally distributed about zero mean with unit 
1 2 


variance. 
(Geary, 1930.) 


10.10. Show that the ratio of two independent variables distributed as 
аР ос е"); — m yn-lga, 0< m, «zc «o 
ар œ е7"; — уң „үз—1Ду, 0< т, <x < ico 
has a frequency function 


_ у ет, dg e = Ne ) piri? 
d (pi — 1)! re + 23H 1 Ys ( + 22 


Уз p 


e ipic. | 
2 7 HOw 
V f, 


Ya 
yi eps £m-1 (» = (2) (рь B 1)e&m.-? ie! 
a (p, — 1)! 20 n ney 1 Уз ( т эү" 
= Ve , 


Ye 
where & = m, — mw. (This includes Fisher's z-distribution as a particular case.) 


10.11. Show that the ratio of two variates v -A where x, is distributed normally 


A 
with mean m, and variance c? and the second like a standard deviation in normal samples, 
Le. with distribution function given by 


dF x e76-"(s — mg)?! ds 0 «m, <s < о 
has a frequency function given by 
А jej Ј "m g+1 


ЈО) 


2 


+ 

= т» - A— 

сү/(2л)Г(р) = jony X E $0 ўоа(у a myer 
с 


2 


Where & = m; — MV. , ә 
(This includes “ Student’s” distribution as a particular case.) 


CHAPTER 11 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


11.1. In the previous chapter we have considered methods of deriving sampling 
distributions in an exact form when the parent population is completely specified. Those 
methods are not applicable when the parent is not completely known, and they may in 
any case lead to results which are difficult to apply in practice, e.g. by yielding an integral 
which has not been tabulated. In such cases we can frequently deal with the problem 
by finding approximate forms for the sampling distribution, particularly by ascertaining 
its lower moments and then fitting a tractable type of curve such as one of the Pearson 
class. 

A procedure of this kind has, in fact, already been considered in Chapter 9, wherein 
it was seen that approximate expressions could be derived for the first and second moments 


` of sampling distributions in terms of the lower moments of the parent. When the sampling . 


distribution tends to normality this, in effect, solves our problem, for the first and second 
moments determine a normal distribution. The methods of this chapter are really develop- 
ments of this idea. We shall discuss exact methods of finding the moments of sampling 
distributions in terms of parent moments. Our results are important not only on their 
' own account, but in giving an accurate method of judging the degree of approximation 
of the expressions for large n discussed in Chapter 9. In particular we shall be able to take 
up some points which had to be left on one side in that chapter—e.g. the rapidity with 
which some functions of the moments such as 4/b, approach normality. 


11.2. It is as well to recall that there are three different types of moment concerned 
in the investigation: (а) the moments of the parent population, (b) the moments of the 
sample and (с) the moments of the sampling distribution. They will be referred to as 
parent-moments (parameters), sample-moments (moment-statistics) and sampling-moments 


respectively. Similarly we shall consider parent-cumulants, sample-cumulants and 
sampling-cumulants. : ы 


11.3. In Chapter 9 we obtained the exact results 


B(m,) = p, 
ys ИА е» d 


E(m,) = al ize = SLM 
7 254 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS ' 255 


1 1 3 
E E: Xy? — ss jzk 
n 
* n —1 n(n — 1) > 
TEUER 
n n? 
n— 1 
= le . : 5 2 : a ELE) 


This is exact and may be compared with the approximate expression given by the methods 
of Chapter 9, viz. 
Elm) = fa se NEUES) 
We might then proceed to find the second, third . . . sampling moments of the variance 
and thus obtain more and more information about its sampling distribution. For example, 
we have for the fourth moment 


Е(ті) = al 5) = WI 
= JE {2(«*)} -Š (2?) * (2) A Ut e?) 3° {2 (x) }* 


ООО o o ag 


We can then find the expectations of the individual terms E an easy Scu of the 


method pee used. We express any powerin terms of products of the type Dae oP... ep) 
when jk =... +1; the mean value of such a product, the 2° being independent, 
is n(n —1)... (n — t + 1)uuy. - u, Without loss of generality we may take our 


origin at the x mean of the parent, so that иу = 0 and other moments are those about the 
mean of the parent. The rest is mere algebra. For example, for the first term in (11.4) 
we have 
Ze) = (а ++... 2,2) 

= 24% + AXwx + 6a,ta Зара + Ia ept tn . (11.5) 
The numerical coefficients require a little ME That of ayant for example, is 3, not 
6 as in the multinomial expansion of (xj + . . . z,?)* because j and k can be interchanged, 
The mean value of (11.5) is then 
Hs + 4n(n — luus + 6n(n — 1)(n — 2)u,ui + 3n(n — Iu} + nln — 1)(n — 2)(n — 3)ui. 
A similar Pn of the other terms in (11.4) leads eventually to the result 


j E(mj) = E — ug)? Ss г (на — Ausus — 24и5из — 1504 + 48и + 96и, 30,4) 
M at — 40psi, — Әбрар — 5404 + 336p + 528%, — 30054) 
"s (61 — Обици — 176бизиз — 102и + 924u,u2 + 1932u2u, — 104414) 
^ “(Sit — 88j pis — 160 p53 — 9592 + 1050.18 + 136043: — 13954) 


4- lus — 928usus — Sls lts — 35uj + 420,445 + 560p5 12 — 630и) e (11.6) 
т 


11.4. Systematic investigations of the sampling moments on these lines (though by 


256 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


a somewhat different method) were carried out by Tschuprow (1919) and, for the particular 
case of the variance, by Church (1925), who corrected some misprints in Tschuprow’s results. 
Unfortunately the resulting formulae are exceedingly complicated—the above is one of 
the simpler cases—and are obviously unsuitable for practical work. 

It then began to be appreciated that their complexity might be due to the use of a 
special type of symmetric function of the observations, namely the moments, and the 
question arose whether other functions might have simpler properties. Thiele had already 
introduced the parameters which are now known as cumulants, and had defined some 
statistics which were the same functions of the moment-statistics as the cumulants are 
of the moments. He also gave some expressions for the sampling cumulants of theso 
functions. In 1928 C. C. Craig developed this work and gave a number of further results. 
Even these, however, were sufficiently complicated and were reached only after some 
labour, and Craig himself remarked that “ it rather seems that the best hopes of effectively 
further simplifying the problem of sampling for statistical characteristics lie either in the 
discovery of a new kind of symmetric functions of all the observations . . . or in the 
abandonment of the method of characterizing frequency functions by symmetric functions 
of the observations altogether.” 

About the same time R. A. Fisher discovered such a new kind of symmetric function, 
the k-statistics, and his remarkable paper of 1928 forms the basis of nearly all subsequent 
work on the subject. The new statistics have the valuable property of yielding particularly 
simple sampling formulae which can be obtained directly by combinatorial methods, obviating 
most of the algebraic labour inherent in the older methods. 


Seminvariant Statistics 

11.5. It will be observed that equation (11.6) does not contain the parent-mean [m 
In deriving it we took an arbitrary mean at the parent mean, which simplified the algebra 
to some extent. The independence of E(m$) of this parent mean is, however, not due to 
this accidental cireumstance. In fact any transformation of the variate from one origin 
to another leaves m, unchanged, for m, = Z(x — ту)? and the transformation increases 
each x and m; by the same amount, leaving their difference unaffected. Consequently 
if m, is independent of the location of the origin, so must be its sampling moments. "Thus 
our sampling formulae are very much simplified if we use statistics which are independent 
oftheorigin. In equation (11.6) there are terms corresponding to из, ошз, 23, шаи and pg 
If we had to take account of possible terms in ші there would be additional terms such as 
Hilti Heki’ and so on, our formula containing 22 types of term instead of only 5. 


К 11.6. А statistic which is independent of the origin of calculation is said to be semin- 
variant. The moment-statistics about the mean are seminvariant. 
second family of statistics k, (p = 1, 2, . . 5) 


*$,... m Such that the mean value of ky 


We now consider a 
› Symmetric polynomials in the observations 
is the pth cumulant, i.e. 


E(k,) = kp. А р 

Note first of all that L, is uniquely determined by this definition ; 
two functions 5, and k, obeying (11.7) their difference ky ; 
value. But this difference is itself a symmetric function 
as the sum of terms Хар, Xx; 277—1, etc., and hence its mea: 


of which is a product of moments. The vanishing of this 


. (11.7) 
2 for if there were 
— kp would have a zero mean 
and can therefore be expressed 
n value is a series of terms each 


momen series would impl tionshi 
among the moments which is impossible except perhaps for particular seri n AE 
Hence £, — Ё„ must vanish identically and thus ky = k, 


SEMINVARIANT STATISTICS . 257 


Secondly, note that the %’s are in fact seminvariant, except for k, which is equal to 
the mean itself. In fact, we have by Taylor's theorem 


k 
Ev, Б, os а +h) mms... Xp) + р, QI cg na) 


fee 
+ Dy (er, Eas eee Un) +.. e e (11.8) 


where 
a 2 д 


П zT. П 


О, ОИ МЕ 42 


Taking mean values, and remembering that «, itself is independent of the origin, except 
for «,, we have ; 


h 
rs Kp = кр + 1205) +, ete. А . B « (11.9) 
Thus Z(Dk,) and other terms on the right vanish separately, for (11.9) is an identity in h. 
In virtue of the remark above, this implies that Dk, — 0, D*k, = 0, and so on; and hence, 
from (11.8), 
ky(v, + h, а + h, o o RS HR) = hey gs... m) 


i.e. Ё is seminvariant. The exception to this rule is Ё, which has as its mean value ki =m 
and thus : 


ky = 5). . . . . . » (11.10) 
11.7. We now proceed to find explicit expressions for the k-statistics in terms of the 
observations x, . . . z,. By definition kp is of degree p in these observations (for x, is 
of order p in the moments, that is, the sum of the orders of the moments comprising any 
term in x, is р). We may then write 
б = 2I. ep. Pra Peg SEE шт, VAr . ә. p,"s) (1111) 
where the second summation extends over all the ways of assigning the л, + ms +... л, 
subscripts (including permutations) from the n available and the first summation extends 
over all partitions of the number p, (pip .. . pj). A(pj*... ps) is a number 
depending on the partition. 
We have 
fuu 49g... +P, =P! « s i » (11.12) 


and define p by 
лу ++... я, — p. . : Š . (11.13) 
On taking mean values of (11.11) we have, since the z's are independent, 
Ky = 2 (Mp Up, e +» Up AB}, . y * ‚ (11.14) 
where B is the number of ways of picking out the p subscripts from т, permutations allowed, 


and is therefore equal to n(n — 1) . . . (n — p + 1) = nll, 
Now from equation (3.31), we have 


„\" Гауе (= 1p — 1) 
Kp = pid en) 3 1 VERE BICIS) 


Til ае! 


A.S.—VOL. I, R 


258 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


the summation extending over all partitions subject to (11.12) and (11.13). On identifying 
corresponding terms in (11.14) and (11.15) we find the values of the A’s and on substituting 
in (11.11) obtain finally 


p(— 1y-(p — 1)! m... mp 
kp = 5 5 n > 
nl (pi)... (plz! 1 „5. т) 
the explicit expression of Ё„ in terms of the z's. 
We may notice an important simplification of this expression which is crucial in a 


discussion of the sampling properties of the k’s. Apart from factors in p and n a typical 
term in (11.16) may be written 


E: (E TP: qu 2) 1 
pi! pi! pil ПЛАТЕ 


where, it is to be remembered, permutations of the subscripts are allowed. There will 


be a term of this type corresponding to every partition of p into ws and of p into p’s. 
Consequently we may write 


— Yy-i(p — 1)! 
kp к= Ө RAS Conor cin МОМО 


E (алб) 


where there is a term in the second summation corresponding to every possible way of 
assigning the subscripts. In this assignment subscripts are regarded as distinct entities. 
For example, if from the n subscripts we choose p, to be 1, p, to be 2, . . . р, to bez, + 1, 
and so on, there will be as many different terms as there are ways of choosing p, from 
the 1°, and so on, i.e. 
p : 

(рз... (ndyemlb.. m 5 r : 7 (WLIS) 
In fact, (11.16) is a condensed form of (11.17) in which all the terms leading to the same 
x-product are added together, their number being given by (11.18). 


Expression of k-Statistics in terms of Symmetric Products and Sums 
11.8. Writing 
[pi ps... p] = Харар"... кр) #0]... l. (11Л9) 
so that, for instance,. і 
[21]. = Z(z? vj) 
[221] = X(z a? x,) 
we see that the mean value of [рү . . . põ] is nllu, 7, , 


дъ. We can then write down 
the k’s in terms of the symmetric product sums [p7] at once from the expressions of cumul 


s : ants 
in terms of moments. For instance, from (3.33 h = Fie SI IE AE 
Me ( ) we have к, Hs — Susp, + 243 and 
„ 2181 3121] , 219] 
| zc а D T 
p PO nel 
3 3[21 3 
8) 3), ony 


n ^(»—1 аа 1) 9) * . + (11.20) 


k-STATISTICS IN TERMS OF SYMMETRIC PRODUCTS AND SUMS 259 
a result which, of course, can be obtained directly from (11.16). In fact, there are three 
partitions of 3, (3), (21), and (13). From (11.16) we then have 


р _ 1 3ip] (шуа , (— 1)?2!3![13] 
| n (3) nim — 1)2!1111! © n(n — 1)(n — 2)(11)33! 


[3] 3[21]  , 201°] 
n n(n —1) ' n(n — 1)» — 2) 


as before. 
It is, however, more useful for practical calculation of the k-statistics to express them 


in terms of the power sums defined by 
Kj = HER). Ó о 5 5 : (11.21) 


This can be done by expressing the product sums (11.19) in terms of power sums (a pro- 
cedure which may be facilitated by the use of tables of symmetric functions) or directly 
as follows :— 

Assume 

; Ёз = G83 + 44558, + a582. 


Since E(k = xy = u, we have 
Hs = asE(s,) + a,E(s,5,) + а„Ё(в\). 
Hence, for moments about an arbitrary point 


HS — Sip + 9482 = апи) + as {ply + (n — yi) | 
+ а, {тиз + 3n(n — 1)дуну + n(n — 1)(% — 2)и\°}, 


from which we find, identifying coefficients, 
1 = (а, + a, + a) 
— 3 = n(n — 1)(a, + 3a;) 
2 = n(n — 1)(n — 2)a, 


whence, solving for ao, а; and a», we find 
Lee 3 
ks = am 8з — 32628; + 28%). 


11.9. The first eight A-statisties in terms of the power sums are as follows :— 


1 3 
ka = вз — Snsys, + 280) (11.22) 


1 ° 
Ду = ai (n? + n2)s, — 4(n? + т) вав, — 3(n* — n)s3 + 19ns,s? — 6st} 


ks = + 5n3)s, — 5(n® + 5n2)s,5, — р — т") аз, + 20(n? + 2n)sss? 
+ 30(n? — n)s3s, — 600,8? + 2487) 


260 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


ke = aa ((n5 + 16n* + 11 — 4n?)sy — 6(n* + 1623 + 11% — 4п)з„зү 
ni) 


— I5n(n — 1)2(n + 45,5, — 10(n4 — 2n? + 5n? — 4n)s5 " 
+ 30(n3 + 9n? + 2n)s,s; + 120(n? — п)835:51 + 30(n3 — 3n? + 2n)s$ 


— 120(n? + Зп) — 270(n? — n)sèsi + 360ns.st — 120s} } 
Б = (nt + 42n5 + 119n4 — 42n?)s; — 7(n5 + 42n4 + 119n3 — 42n?)s,s, 
ті 


ae 

— 21(n> + 12n* — 31n? + 18n?)s,5, — 35(n* + 5n? — бл?) 1S3 
+ 42(n* + 27n3 + 44n? — 12n)s,s? + 210(n* + 6n? — 13n? + бт)5 48281 
4 140(n4 + 5n? — 6n)s$s, + 210(n* — 3n? + 2n2)s,52 

— 210(n3 + 13n? + бп) 52 — 1260(n? + n? — 2n )8;8»8{ А 
— 630(n? — 3n? + 2n)s$s, + 840(n? + 4n )sast + 2520(n? — n)ss 
— 2520ns.s? + 72051} 


+ (11.22) 


k= aig (n + 99n9 + 757n5 + 141n4 — 398n? + 120n?)s, — S(n* + 99n5 + 757 n* 
+ 141n3 — 398n? + 120n)s;s, — 28(n* + 37n5 — 39n* — 157m? 
+ 278n? — 120n)s,s, — 56(n5 + 9n* — 23n* + 11123 — 218n? + 120n)s,85 
— 35(n* -- n5 + 33n4 — 12123 + 2065? — 120n)si + 56(n5 + 684 + 359n? 
— 8n? — 60n)s,5? + 336(n5 + 23n4 — 31n? — 23n? + 30n)s,5:5: 
-- 560(n5 + 5n* + 5n? + 5n? — 6n)s,555, + 420(n* + 2n* — 25n* 
-L 46n? — 24n)s,s2 + 560(n5 — 4n4 + 11n? — 20n? + 12n)ssss 
— 336(n* + 38n? + 99n? — 18n)s,s? — 2520(n* + 10n? — lin? + бт) 45.81 
— 1680(n* + 2n? + 7n? — 10n)s3s% — 5040(n* — 2n? — n? + 2n)s3525; 
-— 630(n* — 6n? + 11n? — 6n)s$ + 1680(n? + 17n? + 12n)s,st 
+ 13,440(n3 + 2n? — 3n)s,s,s + 10,080(n3 — 3n? + 2n)sis] 
— 6720(n? + 5n)s,s? — 25,200(n? — п)з28! + 20,16015,5] — 50405; } ] 


In particular, we have 


b; = my 

К = E 1" 

Т = n? m 200. (11.23) 
Б (pL) (ie 


n? 
ho mg cg) t Dc 30 — m) 


expressing the /?в in terms of the moment statistics. 


11.10. There is a well-known theorem of symmetric functions which states that any 
rational integral algebraic symmetric function of x, . . . x, can be expressed uniquely, 
rationally, integrally and algebraically in terms of the symmetric sums s,. It can thus be 
so expressed in terms of the k’s, for from equations such as (11.22) the s's can be so expressed 
in terms of the Ws. Thus an investigation of the sampling constants of any symmetric 
function expressible in terms of rational integral algebraic symmetric functions can be 
translated into an investigation concerning the k’s. Р 

To round off this account of the relationship between the k’s and the s's we may refer 
to two interesting operational properties. Write K, for the same function of the differential 


SAMPLING CUMULANTS OF &-STATISTICS 261 


д д : 

т k, is of ће z's апа S, for the same function of the operators as 
ы "m 

8, is of the zs. Then 


operators 


= р! 
Кз, = p! e «t Soest is оо 


Kaos + 2 
where (p, . . . Pm) is any partition of p other than р itself; and 


A 
p 0) MI J^ d (оо 


i) 
ар 3. 
Methods of proof and applications of these results are given in the exercises at. the 
end of the chapter. 


Sampling Cumulants of k-Statistics 

11.11. The problem of determining the sampling moments or the sampling cumulants 
of k-statistics is that of finding mean values of powers and products of those statistics. 
To any number a with partition (a;% a." . . . ау") there will correspond a moment 


pas... Gs) = Еа) э 0s ус (11.26) 


and a cumulant к (a; . . . a) related to the moments by the identity (cf. equation (3.54)) 


[ACA lats EA Ty Pm = 
Eso I =) = log [ub I са . (11.27) 


For example, the fourth cumulant of k, will correspond to the fourth moment of ks, 
which is the mean value of 4$. These quantities will be written к(24) and (24), in accord- 
ance with (11.26). Again the cumulant «(32) corresponds to the moment (82), the mean 
value of kk, or their covariance in their joint sampling distribution. Generally, in the 
simultaneous distribution of the k’s there will be a separate formula of degree a for every 
partition of a. 

Now the product ka™ . . . ka,™ is homogeneous and of total degree @ in the a’s. 
Hence, when mean values are taken u(a,™ . . . a,*) will be homogeneous and of total order 
ain the parent y’s. Since the x's themselves are of homogeneous order in the и? it follows 
that «(a,% . . . а) is of homogeneous order in ће xs. Hence we get the first rule for 
the sampling of k-statistics (which is true of seminvariants generally) :— 

Rule 1. x(a, ... a) consists of the sum of terms each of which, except for con- 
stants, is a product of parent «’s of order a. 

For instance, к(24) is of total order 8 and is therefore the sum of terms in Ks, кєк», 
кї, как? and кё. Similarly «(32) will contain a term in к; and one in kk and no others. 
As seen in the next rule, no terms in к, appear (as again is true of seminvariants generally). 

Rule 2. No term in «(a^ . . . @,%) contains ку, except «(1) itself. 

This follows as in 11.5. The A-statistics are seminvariant and hence their sampling 
distribution cannot depend on the variable quantity ку. The exception occurs when we 
are dealing with the only statistic which is dependent on the origin, namely k,, and here 


к(1) = к, as is evident from the definitions. 


11.12. We now enunciate and illustrate the rules by which the terms in к(а,“ . . . a) 
can be found. As the proof of the validity of the rules is difficult to grasp until their nature 


has been comprehended we defer а proof until later in the chapter. 


262 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


To find the term in «^ . .- kon in к(а"... as“) consider the two-way array 
2 b, 
b, ух 
E ee (1598) 
| ba 
| ba 
| 
| 
GB. Cie odo Ko ола 


corresponding to every part in k(a,™ . . . ag»). Consider the various ways in which the 
body of the table can be completed by the insertion of numbers whose row and column 
sums are the respective b and а numbers; e.g. if we are seeking the coefficient of «gr in 
(422) we shall consider such arrays as 


where there is a row corresponding to every « in the term к... кь?" and a column 


p. er DS 278 1.6 ers were КИ 
np Ir 2 TIPS 2 362.2 aL oh 22 
ji ga 2 1 1 2 1. ET PE «Qua 
д. ан) 4 4 2110 4 4 2:| 10 

Then the rules by which these arrays give the coefficients of к... къ," are as follows : 


Rule 3. Every array in which the numbers in the body of the array fall into two or 
more blocks, each confined to separate rows or columns, is to be ignored. 
For instance, in the foregoing example 


& 2 6 
ud 2 

2| 2 
4 4 2| 10 


is to be ignored, since the 2 x 2 block in the top left-hand corner has no row or column 
number in common with the entry in the bottom right-hand corner. 


Rule 4, Subject to the ignoration of terms enjoined by Rule 3, to the coefficient of = 
Keres E kg, P in (a . . . ag“) there will be a contribution corresponding to each way of 
completing the array (11.28). Such of these as do not vanish are composed of a numerical 
coefficient multiplied by a function of m. 

Rule 5. The numerical coefficient is the number of ways in which the column totals 
considered as composed of distinct individuals, can be allocated to form the array concerned, 
divided bya ГАТ à 

X Rule 6. The function of n, called the pattern function, depends only on the configura- 
tion of zeros in the array, not on the actual numbers composing it or on the row and' column 


totals. The function is given by considering the separations of the rows into distinct 
groups or separates. 


SAMPLING CUMULANTS OF &-STATISTICS 263 


(i) With one separate there is associated the number n, with two separates 
n(n —1)..., with q separates пт — 1)... (n—9- 1) 

(ii) In each separation we count the number of separates in which a particular column 

is represented by a non-zero entry. If in p separates, we assign the factor 
(= (р 
a(n—1)...(n—p-d- 1) 

(iii) This is done for each column. 

(iv) The various factors given by (ii) and (iii) are multiplied together for each separation, 
multiplied by the factor appropriate under (i) and the results summed to give 
the pattern function. 

Rule 7. Any array containing a row which consists of a single non-zero entry has 


a vanishing pattern function and is to be ignored. 
Rule 8. Any array containing a column which consists of a single non-zero entry 


er Е SA 
has a pattern function z times that of the array obtained by omitting that column. 


Rule 9. Any array the non-zero elements of which consist of two groups connected 
only by a single column has a vanishing pattern function and is to be ignored. 


Example 11.1 

As an illustration of these rules (which are not as difficult as they look), suppose we 
seek for the coefficient of «gic in к(422). If the reader will write down the thirty or so possible 
arrays with column totals 4, 4, 2 and row totals 6, 2, 2, he will find that the only ones which 
do not vanish are those of (11.29) and permutations of rows and columns with the same 


sum, namely 


РОЛЕРИ 232.156 8521/0 2O x Bale) A 
n 2 LL ES 17 1 О neos dug 
TNT 2 1 D 2 1 1; 2 iod 2 
4 4 2|10 4 4 2|10 ЖЕБЕП ЕТ. 4 4 2110 
(2) (0) (c) (d) 
3 1| 6 В |86 БШК) ija 
SELLE 1 diee "T gis cm 
1j d ot Te 5 2 1 ie a . (11.30) 
4 4 2|10 4 4 2|10 à 4 2|10 
(e) (f) (9) 


With practice the reader will find it unnecessary to write down arrays such as (c), (2) 
and (e), which are merely obtained from (b) by permuting rows and columns, but for clarity 
at this stage they have been set out in full. There is one trap here to be particularly 
noticed. In array (b) the two columns summing to 4 and the two rows summing to 2 are 
different, and their permutations result in 4 different arrays. But in array (f), though 
the rows and columns are different, there are only 2 different arrays. 

Each of these arrays contributes to the coefficient required. Consider first of all that 


n Е 4 4! 1 3 
from (a). The numerical coefficient is (атт) (ее) я = 72. The first factor in 


brackets is the number of ways of allocating 4 individuals in the partition 2, 1, 1, similarly 


264 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


for the second, and we divide by 2! since there are 2 members of the row totals the same, 
this being the only В factor. 


Under Rule 8, the pattern function is : times that of 


X X 
x х 
x x 


There are five separations of this, one of one separate, three of two separates and one of 
three separates. The contributions respectively under Rule 6 will be found to be 


Коса 


Bam Du "ES 7) А ce 1) 


Set et s (— 122! 3 4 
Be - Do Ра == 3s = 5) n(n — in — 2)" 


n 


The sum of these is CEN) and hence the contribution from array (a) in (11.80) is 


72 
(n — 1) — 2) 


Now for arrays (b) to (e), which have all the same numerical factor and the same pattern 
function and ‘can therefore be considered together. For any one the numerical factor is 


4| \/ 4! \/ 21\ 1 
(sam) (пт) (зац) a T 


and that of the four together is thus 192. 
Under Rule 6 the pattern function will depend on the configuration 


Xo X X 
X X 
x . x 


where x stands for a non-zero entry and a period for a zero entry. There are five separa- 
tions of this, one of one separate, three of two separates, and one of three separates. The 
. contribution from the first is 


nnn т? 
for each column has a non-zero entry in the separate. The contribution from the three 
separations given respectively by isolating the first, second and third row will be found to be 


— 1 1 П 
n(n — 1 = аай. 
) n*(n — 1)* "m n*(» — 1) p nin = 5] n?(n — 1)? 
The contribution from the separation of three separates is 
n(n — 1)(n — 2) a =! а = x 
n(n — 1)(n — 2) n(n — 1) n(n —1) n*(n — 1)? 
The pattern function is the sum of these three contributions and is thus 5 
a 


(n — y 


- 


SAMPLING CUMULANTS OF Z-STATISTICS 265 


The contribution from arrays (f) and (g) in (11.30) will be found to be ESTIS 
aye 


Hence, adding all the contributions together, we find that the coefficient of кєк; in 
«(4°2) is 
72 199  , A 1 8(37л = 65) 


(n — 1)(w — 2) + (n — 1)? * (n —1) (n — 1) (n — 2)’ 


as shown in equation (11.62) below. 


11.13. Rule 10. The expression for any x(a,“ . . .) which contains a unit part may 
be obtained from that without the part by (1) dividing throughout by n and (2) increasing 
the suffix of one of the xs by unity in every possible way. 

For example, it may be shown that 


Ka 2x3 


зу qs. 2 
per hut om qu 
" Ks 4к;кз 
Непсе к(221) = — + 


n? пт — 1) 


4к2 L Akika 
n? ' n*(n —1) ' n*(n — ly 


and so on. 


11.14. The reader may be inclined to doubt whether this rather elaborate com- 
binatorial procedure represents much of an advance on the straightforward algebraical 
approach considered earlier in the chapter. A few trials of the two methods in particular 
cases will soon convert him to the former. The division of the coefficients into a numerical 
factor and a pattern function greatly simplifies the method and in fact all the functions 
likely to be required for practical purposes have been tabulated by Fisher (1928) or can 
be derived therefrom by an iterative process given by Fisher and Wishart (cf. Exercise 11.11). 


Example 11.2 
To find the variance of the second moment statistic Ms. 
From (11.23) we have 


ky = 


т 
a 
1 


n — 1\? 
Hence var M, = ( ) var ka 
т 


= (* = «eo. 


(2?) consists of two terms, one in x, and onein x3. The only array contributing to the first is 


266 | APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


: T Ад С 
with a numerical factor unity and a pattern function a The arrays giving the Seo, 


are of type 


bo 
bo 


If any entry in this were a 2 the row in which it appeared would contain ouly a single 
entry and hence the array would vanish. The only contributing array is therefore 


I ye 

i bg 

° 2|4 

ә! \2 
The numerical coefficient is (п) 3 — 2. The pattern function will be found to be 
1 

——.. Hence 
(n — 1) 


n—1 


Je E 2 
{ltt + 303) + 4} 


а= D. ‚ (3 — n)(n — 1) 


n? n? 


As т becomes large this result tends to 
1 2 
UT — ps), 
confirming the approximation given by equation (9.9). 


Example 11.3 


To find the third moment of k, we require «(23). 


c This will be the sum of factors in 
Ko, Kaka, K3 and кї. 


; Sri 
The coefficient of the first is "m For the second we have to consider the array 


ПЕ 11 
ТТ ‚|? 
22 216 


all others vanishing exce 


pt the two equivalent partitions obtained wł i 
une y qs when the column with 


in the first or second place. The numerical factor is then 


21 A2 
s (i) — 12, 


б 


} 


JN 


SAMPLING CUMULANTS OF &-STATISTICS 267 


The pattern function is ~ times that of 


ЕУ 
< 


1 А 12 
————. The coefficient of күк„ is then 
n(n — 1) % 


(n — 1) 


For the term in «j the only contributory array is 


Le. is 


LOT NS 
Ша 
2 2.216 
7 91 3\3 1 : n—2 
with a factor (s п) an 4 and pattern function ЖООР 
For the last term we have to consider the array 
TM HE 
lL « 14109 
= ый I 
22 2/6 
i E 1 = 
with a numerical coefficient 8 and a pattern function UEDA Collecting terms together 
we get 
12k 4k. 4(n — 2) , 8 
1) 65 E H 24 
ко) n? * n(n —1) nn — 18 T Gur 


This is also the value of the third moment (2%) measured about the mean of the sampling 


distribution Ka We sce that if the parent is normal the third moment reduces to = De 


le. is of order 1,-?, indicating a rapid tendency towards symmetry. 


Example 11.4 

Few things illustrate the usefulness of expressing the formulae in terms of cumulants 
and the power of the combinatorial method better than the simplification imported when 
the parent population is normal. In this case only terms in к» survive, all higher cumulants 
vanishing, 

As an illustration let us prove that «(pg) = 0 for normal samples unless pq. 

The only term which can appear in k(pq) is Kk? *? and evidently, if p + q is odd, even 
this cannot do so. If p + 9 is even we have to consider the array 
$ 2 


2 
2 


p qa|p +4 


268 - APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


Now if any entry in this array is 2 the array vanishes since the row concerned will contain 

only one entry. The reverse can only happen if all the entries are unity, in which case 

the sums р and q must be equal. This establishes the result. 

Example 11.5 
Any x(a, . 

is of order n-?. | 
To prove this result we have to consider only the pattern function. 


+ . a“) containing л parts is of order n-@-), For example, (3222) 


Consider the array 
|a 


а а. ..а, |а 


А 
To the single separate there corresponds under Rule 6 the function n(;) кей тї; 
n 
Furthermore, no pattern function can be of greater order in 7 ; for in an array with more 
than one row, with q separates there is associated the factor n" е ека И AS ЕВ where 
mlo qol neal ? 


; and if there is only one entry 
vanishes. Hence the result. 
11.15. By the above 


methods Professor Fisher worked out the sampling formulae 
for degree not greater than 10, and 


gave some of the 12th degree. The following are the 
results, with a number of corrections, 


Second k-Statistic 


f, 24 
(2?) A ET . . 5 " s ‚ (11.31) 
lkka | 4(n 9) 8 
28) = He Dat] 2 
г) n? ' n(n — 1) n(n —1)i Ks ote @ — 1 кў . . а 5 < (11.82) 
24) — Ke 24 32(n — 2) 8(4n® — 9n +6) , 
«(21) ZEE таи 3) a = (=). n2 ) 


n**» —1]s “ 
144 


пъ ipe a нс E AUS a 
uL «HA eI, 
+ e cerit te + s = D 9 ed + ; = кё 
pou ту | rid + (n = 4 


ae. (es) 


\ 
Él 


)— 


SAMPLING CUMULANTS OF k&-STATISTICS 269 


60 1 160(n — 2) 240(2n2 — 5n + 4 
«(28) = -—À—— ) 
) aptis + nn — je t i n(n — 1)2 кук; + nim — 1) KgK 4 
ts NN nus 4(113n3 —520n? 4-950n? — 8002 +265) B 
n*(n — 1) n*(n — 1) 
1200 4800(n — 2) 2400(5n? — 12% + 9) 
m 
' э — jid * n*(n — 1)? какаш a.) SUO 
160(» — 2)(31n — 53). 960(n — 2)(6n? — 12n +7) » 
+ n*(n — 1) du nn — 1) ris 
1920(n — 2)(9n? — 23n + 16) 480(11n? — 41a? + 59» — 31) , 
mn А _ ——— ө -—————— 
; n3(n — 1) tage n(n — 1) ч 
9600 в , 38400(n — 2) o , 9600(4n? — 9n +6) o o 
Sn — Ip t t aa qe А азораи а 
28800(2n? — 7n + 6) 960(n — 2)(5n — 12) , 28800 А 
n*n — 1) raris + n*(n — 1) Oe artem y АЙЯ 
, 3840000 — 2) , , , 3840 , . 
3 RE us . Se. . . . . (11.35 
n(n — 1) кка + в — 1) * ( ) 
Third k-Statistic 
i 9 9 6». 
k(33) = = кї А д Я ‚ (11.36 
ИЯ Кт mea та hes) E ( ) 
27(3n — 4). 27(4n — 7) 


Kok 


1 27 
Вуз 
p ni Tan =a + ma lj e p n(n — 1)? 


54(4n — 7) 162(5n — 12) 36(7n® — 30n + 34) а 


ч 

(n. — 1) — 2) eT e — Ij — et 3 (n — 1)8(n — 2)? 
4 108n(5n — 12) x a A ‚ (11.37) 

(n — 1)(% — 2)? ^ 
id 54 108(2n — 3) 27(17n? — 49n 4- 35) 

in gift n*(n — Tyee + n*(n — 1)? "dec n(n — 1) Keka 

108(7n? — 20% + 10). 27(17n* — 47n + 39) a 27(37n — 70) 2 

гд n(n — 1)? Kew n*(n — 1)? е + n(n — 1)%(% — 2) как; 
29/5 uj 5 224 
324(19n? — 67% + 54) o 162(65n? 2058 F e AN 
n(n — 1)%(n — 2) n(n — 1) (n — 2) 
= 2 99 24 
zh 108(82n? — 481n? + 958% — 640) n 108(59n A 200 itin 
n(n — 1) 30 — 2)° n(n — 1) (n — 2) 
b 324(75n? — 473n? + 10162 — 15б) as 
n(n — 1)» — 2)* 
ЕЕ 97(173n* — 1503n? + 49625? — 7380» + cls 
n(n — Ш) (= 2)8 
, 108(71n? — 263n + 234) TT: ме a im jae 
— In — 
(n — 1) (0 — 2)? n 
2 972(99n3 — 688n? + 1612» — 1280) , 
486(63n? — 290n + 352) "M ( ae BS TER 


(n — 1} (n — 2)? 


270 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 
162(87n? — 594n® + 1420n — 1176) , , 972n(23n? — 103% + 118) 


p (n — Yys(n — 2)5 блк == 1) сасу xe | 
648n(108n2 — 510% + 640 648n?(5n — 12) 
F (n — 1)3(n — 27$ кк (n — 1) — 2)%^ кў E . . (11.38) 


Fourth, k-Statistic 
(42) = ks Tu je = 1а 4 К 
+g == jie tu Ss оз 

«4*) = къ + = = perc TIE) koka + rom кек 
оо eae ПО 
a 


144(56n? — 257n + 302) 2 1440(4n — 11) 
+ wai a tp- pagi. 


>” 


Ka + 


Ksk 


+ 


KeK 4Ka 


1152(22n? — 106n + 133) ‚ 8(709n? — 3430n + 4456 
MOL USER а STO DA 
a 288(19n? — 98n? + 125n + 2) a | 1728(24n3 — 140n2 + 200n + 4 
( 1) — 2)» —3) “2 7 Шз 
) (n — 1)*(n — 2)*(n — 3) tee 
q 432(49n3— 287п2--408% +12) , , | 864(103n9—629n2+948n +24) 
(n—1)(n—2)(n—83) 48 (n—1)(—3)4—3) “aba 
4 288(41n1 —384n3--190952 —19825 — 36) T 288n(53n*—179n —52) 
(n —1)*(n —2)2(m —3)2 (n —1)*(n — 2)2(n —3) 
1728n (29n —196n? +317 +62) sr 1728n(n --1)(n? —5n ea) 6 
(n—1)*(n—2)?(n —3)2 OE (@—1)%Xn—2)%m—a)2 ^  : (11.40) 


Karch 


Fifth k-Statistic 


= 25 100 
к(5?) = D + 200 12 
(5°) trae 1 ^з + - ie pee tS jd 


200n 1200n 
a (n —1)n — 3) кек} ar (к= = зу s + 


850% ^ 
ы P A e E 
(n — Hn — 2) кк, 
1500» 
+ 2 600n(n + 1) 
E pa 5 3d 


1800n(n + 1) 


2 120n2 
@ — D — 2л — 8) t + n*(n + 5) 


@ — 1)» — 2) — 8; — EE . (11.41) 


E 2 


= ———a 


d 


PRODUCT-CUMULANT FORMULAE OF &-STATISTICS 271 
Sixth k-Statistic 
ROH ise а == (36 көк + 180куку + 465кък, + 780кук› + 461к®) 
% n— 
FL A (450i + 360015. - 7200.6. -LE 6300 
(n — 1)» — 2) iğ 
+ 4500«2«, + 21600кьк,к» + 4950x) 
n(n + 1) 5 ci 2.2 
2400x,«3 + 21600 5 15300к2к2 
Te и wy a aic DUE 
+ 54000к uis + 81003) 
n*(n + 5) E 2 
400к,к4 + 2160023 
д (n — 1)(% — 2)(n — 3)(n — ni SILET 9) 
n(n + 1)(n® + 15m — 4) Tons 1142 
ja (n — 1)(n — 2)(n — 3)(n — 4)(n — 5) Gio ie i ў dia 
Product-Cumulant Formulae 
1 6 
32) = уены. m А 
82) = =, LT era - (11.43) 
K(42) = = ke + TUTO C (11.44) 
n m= t 4 4 Y 
^ 20 
(52) = <1, + Pe, je : ; £ . (11.45) 
20 
к(62) = mine = Э 1% dE 1 9 лг L1" . (11.46) 
к(72) = a T 5 1 кк + 1 ккз + a 1 кк . . . (11.47) 
2 70 
«(82) = -Kio + 7e + Д Ев + F Too d zm i . J (11.48) 
1 12 30 36n 
к(43) = 2 Раа uH 
pa WIE uum уш m= ija 2) ^з Ў Р - (11.49) 
1 1 2 т 2 
к(53) = zie $ cm (1бквк» + 45кук; + 30x?) "s c D cu) (60i u$ + 90,2 к,) . (11.50) 
«(63) 1, + авиа -{- 6З 4 OBRA] 
п п —1 
т 2 3 
ms Du Dmm Duc (90$ + 360какзк. + 903) . ^ Р t . (11.51) 
(73) — = ty — (21кьк» + 84к›к» F 1684 + 1053) 
У асту iei GR soda + 6308) 2. 1.52) 


272 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


1 t ET 
к(54) = Ta + = үка + "10r is + 190ksk 4) 


+ ТЫГЕ» (120кьк? + 600к,к»к„ + 1803) 
(n — V(r 
n(n + 1) 


240K gK3 ` . . E x * . (11.53) 
bros — 1)» — 2)(n — 3) 


к(64) = z rads H (tie, бк к. 4 1946, 4- 120K2) 


n 


2 a F Т20к2к, + 1260 ucl) 
AF (n — 1)» — 2) (180кекз + 1080к;кзкь 1-720 а 
оиа поводу E - x а В) 
* à — Tn = 3) — 3) : ^ 
12(2n = д т ГИ + (11.55 
(sary = p” 4 UM E PLE m ^g n s (11.55) 
бй — 1) $m ~ 19 
i(493) — = == Lu тл LE ia d ad — 3 Malte + n(n — 1)* Kt 
sic K ка r [ D П 0 0 0 (11,66) 
uc RD куё kgs à їй | 
1 24 20(3n — 4) L 20(5n — 7) 
(522) E П ЖП = SE Fr n(n 1) KgKs + n(n — 1) кк 
ОИ 480 120 5 3 
" es qp (n — sitit * @— 1p" : s P s diui 
1 28 19(7 — 9) 4(41n — 56) 
FUR SS UE nn — I) n A лу, f n(n —1)2 "4 
200m — 7) , | 108 „ 840 560, 
n(n —- 1)? «t (n — Tje “92 i (n — туз soa 2 (n — 1j a 
ce А Loewe Us TES S c. fie 
Su d 21 6(8n — 11) 9(àn — 5) , 
ет UE nu 1 сорур eft nw — 1 
USC aues 190m — 20). 36» А 
= а 0) 8 = 19 — H+ GS ly(n—32)* * (11.59) 
1 26 24(3n — 4) 10(11% — 17) 
82 6 ni^ n(n — ees T n(n — 1)? fuse n(n — 1)? gus 
305m — 9) — .  19(01m — = 3605» — 12) 
oF (n — 1)*(n — 2) Aa р (n — 1) — facis (n — 1)%(% — — 9) “ 
360» " 
(n—18m—2)99 * P F з s p я . - (11.60) 
1 31 101% — 131 5(87n — 55 
к(532) = aif + "gu p^ sls n туз ккз ar ee 52 кока 
5(23n — 35) , , 3000—16) ,, 30(45n — 92) 
aw —19 TG Hm — эў" à Т) — оу eta 


PRODUCT-CUMULANT FORMULAE OF £-STATISTICS 273 


60(15% — 31) к? 


30(45n — 103). à 


720n 


кака o. 


(@ — 1n — 2) 3 * à — 19 — 2)" * i — 18 — 8j 
1620n Bas 
pope ga 9 0. = JM . (11.61) 
2 1 2 8(13n — 37) 4(49n — 73) 
4?2) = 
pos n? Bier n(n — nud T n(n — 1)? Ais + n(n — 1)? E 
4029 — 46) , , 8(37n — 65) „, 1536 
nu —1)2 5" (в 1) —2) °* P3 бел 
144(7љ — 15) 72(21n — 50) 3 96(10n? — 27 — 1) 5 
Tp cs Ds cd] P eel а) а M cs 3) 02 


144(17n? — 53m — 2) ssi 192n(n + 1) * 
(—D5»—3)-—37*"(&—1a—3m-—35' -* - (82 
= 25) = 
ie(489) = zu = s pj ^s + uc es кз кз 4 е Ki 
(19) = 84 180100 B 72(238n — 62) 
+ nn — шыш. + (n — 1j(n — | j^ ud + pg s a 
. 519 — 48) , 54(33һ* — 148» + 172) 9 o 
(n — 1)*(n 9) 4^ ia (n — 1)%(% — 2)? hag 


72n(l7n — 40) 


108n(27n — 70) 


210n? 


(n — 1)(n — 2)? кй + 


oe 


T 30 
«(322) = ni’? ds кк + 


n(n — 1) n*(n — 
240 360(2n — 3) 
n(n — 1)? 


An — 1 E ksk + 


KgKa + 


(n — 1)*(n — 2) 


X 23n — 37) 


s + + (11.63) 


(n — 1)*(n — gi 


= 2(9n2 — 2 
53) urs ER 12(9» 23n + 16) 


1)? n(n = He 


кука 


какзкә + 


кзк$ 


Т7 Iy (11.64) 


4(47n?'— 120% -+ 81 
KK3 + ( а. ) 


n?(n — 1) n*(n 
12(0n? — 24n + 17) ə , 


Kok, 


— 1)? n*(n — 1)8 
360 2. 288(5n — 7) 


144(7n — 10) m 
n(n — 1)8 Р 


«(3%2) l 37 6(17n — 27) 


n?(n — 1)? iEn n(n — 1)? 
24(49n — 95) 960 
ETC! — 1)? : (n — 1) 


Ккк nin aa aie ^ 
at m(n — 1): 5282 


кў + —— кка (11.65) 


216 
—1)3 a Tg = 3 


3(61n? — 166» + 117) 


gie + nn — 1) n3n 
2(59n? — 154% + 113) , 


| 


— 1)? эў n*(n — 1)8 кес 


6(67% — 131) 8 


кї 
n?(n — 1)? i 


24(71n2 — 246n + 202) 


' n(n — 1)2%(n — 


3j KeK 
36(29n? — 103% + 93) , 


n(n — 1)%(% — 2) 
36(38n? — 155n + 160) 
n(n — 1)5(% — 2) 
144(19n — 44) , 


KgKaKa 


как: + 


n(n — 1)%(n — 2) a 
72(14m — 23) 
(n — 1)*(n — 
288n 5 


3) кака 


a-pa- tt 


(n — 
4.8.—vor, 1, 


‚ (11.66) 
m 


1)3(n — 2) Б 


27 J APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


11.16. Additional formulae for the case of a normal parent population have been 
worked out by Wishart (1930). There are two general formulae :— 


к(27) =, ‚ ^. (0187) 
: ?'(r + 1pg — 1)! 
(p22?) =e : . (11.68) 


and the following specific formulae of degree 12 and upwards (those of degree 10 and lower, 
of course, being derivable from equations (11.30) to (11.66) by putting all к°з higher than 
the second equal to zero). 


34,560n 
OD с-ка U^ o dA QUIZ : . (11.69) 


7776n*(5n — 12) “ 


89) = — loc ЗИС 
__ 108,864n?(5n — 12 
«(3522 = os 0007 9 À me E NE eu gb) 
1,741,824n%(5n — 12 
(39) — 466,560n*(22n? — 111m + 142 
(9) = aie so ^ro i MEME REL drin: 
0| — 18 o 
Eee (aD E З : е 8 A . (11.74) 
x(3522) = 224 : © А E : » К Д n . (11.75) 
«4222 — 1920n(n + 1) 
Sweat eee es re. отв) 
(4298) = 23,0402 +1) a 
(rc д == б Бу ee ee c ЛАКАН уу 
(4298) = —_ 322,560n(n +1) g 
ПШ ДО (= ЧЕН Fee >=“. (изу 


K(482) = 20,736n(n + 1)(n? — 5n t2); 
(ЙМ (йд) (жду *- 
к(4392) = 22000 + D(n* — 5n + 2) i е : 
(n — 1)4(n — 2)*(n — 3) 9 . . - (11.80) 

(4222) — ®6%Ь864щЩ% + 1)(n? — 5n + 2) 
(EWG О (желш ОЕ дыт x Y oll (11.81) 

K 6912n(n + 1) ; 

«65 = Tim — Bn — зуз (581 — 42882 + 1025n* — ат + 


. (11.79) 


180} «8. (11.82) 


к(а49) = 16 itas) з 
cm . - . + (11.83) 
288 
k(4122) = die (44) . 
@= тув" ) - (11.84) 
484.125 
«(45) = ee approximately . * . 


"09. (11.85) 


PROOF OF THE VALIDITY OF THE RULES , 275 


In virtue of the result of Example 11.4, expressions of odd degree vanish, e.g. 
к(82") = (527) = 0. Further, in virtue of (11.68), «(p2") = 0 if p > 2, for к(р) = Ky = 0 


- for the normal distribution. Methods of proof of (11.67) and (11.68) are suggested in 
Exercise 11.9. Exact results for «(45) and «(4*) are given by Hsu and Lawley (1939). 
Proof of the Validity of the Rules 
11.17. We now proceed to prove the validity of the rules enunciated and exemplified 
above. Rules 1 and 2 have already been proved. 
As a preliminary let us define an operator 9, such that 
ju, = T(r—1)...(r—p 4 I т> р\ 
Qu, = p! | 201 @ шу) 
дл = 0 TU 
and 2048) —(2,4)B + A,B) . ^ Е 
so that д acting on a product is distributive. 
In virtue of (11.87) we have 
д„(и)” = mu" дь, 
> Or =e a 
= ди, (u,)" OH. 
Tt follows that if f is a polynomial function in the ws 
of ы. э лу 
= Osis 2 5 . . 8 
dpf ди дщ яр. ди дн» 3E (11.88) 
and this also holds if f can be expanded in a series of polynomials in the ws. 
Now consider the expression defining the seminvariants in terms of the moments (3.11). 
P , pt? 
exp (st +.. egt ee Lu 
| On operating on both sides by 9, there results 
> ip {р 7 
exp (e Y. ко + UE Je l2: т dx ) =P + yeti |, 
| , ‚р 
| etait... mm +...) 
p! 
A and hence x 
| бый оа был а В 
| This is an identity in ¢ and hence 
| дрк = p! | 
dye = 0 qzp) ' . . . . (11.89) 


For example, 


к, = p4 — App, — 3p + luam? — би : 
дук, = из — 4и — 12 pep, — lpo, + 241° + 24p,u, — 2453 
=0 
дек, = 1905 — 924uj? — 12u, + 24u? 
= =0 
[ І дк, = 24u, — 24ш, 


| 
Y 1 Q,k, = 4! 
a 


276 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


11.18. Now in accordance with Rule 1, which we have already established, «(a.%*. . . а„%) 
and hence д(а,* . . . а„%) may be expressed in terms of parentx’s by an equation of the 
form 

p(a а” ...) = L{A (icy ih...) 5 * ‚ (11.90) 
where A is a factor which it is our object to find. Operate on both sides of (11.90) by 


(0°: 0," . . .). Every term on the right is annihilated except that in (кък...) 
and we have 


A-(y uy... Bi! Вы о. —(Qy ^0... )(aam...) . (11.91) 
We now consider an operator б„, analogous to дь, which, when acting on a power of x (of 
any suffix), reduces the exponent by р and multiplies by r(r — 1) ... (r — p +1); and 


we will suppose the operator to be distributive.* Regarding u(a,*^ а, 


. . .) as the mean 
value of (b,^ = . 


- -) we see that the result of operating by the д'з on the mean value is 
the same as that given by taking the mean value of the operation of the 0°. But this 
latter operation results in a constant, which is equal to its mean value ; and we thus have 

(0,9: 0, Ps rt Bi (k, % k% ә 
(1. (039 Гу (к oe eer : . (11.92) 


Our rules are concerned with the evaluation of this operation. 


11.19. Consider now a completed array of type (11.28). А little reflection will show 
that there is one such array for every term in (11.92) which does not vanish by operation, 
and that every term in (11.92) will have its corresponding completed array. The numbers 
in the body of the array are the powers of x occurring in the k-product ; added horizontally 
they compose the orders of the operators; added vertically they compose the orders of 
the corresponding bs. A completed array is, so to speak, a chart of part of the operation В 
and the whole operation is the sum of all possible completed arrays. 

The operation (11.92) gives us the coefficients in ula a |. .), but we wish to find 
those in the corresponding к(а; a . . ). The necessary allowance is made by Rule 3 
which we now prove ; that is, the coefficient of GP к. „іп (ачаг. -) is given by 
all completed arrays, ignoring those which are resolvable into separate blocks each confined 
lo separate rows and columns. 

Referring to equation (11.27), expressing the relation between mu 
and cumulants, we see that x(a,“ а= . . .) is the sum of terms 
one, two, three . . . multivariate moments. ад quts \ itself 
Consider а two-part term such as џ(а,* a,%* ©. .)u(a,% y^ єз) where a; UM 2» 
etc. Its coefficient in the expansion on the right-hand side of (11.27) is iw. 


Itivariate moments 


composed of products of 
The first term is ru А 


PTI! о eoo 
and hence the coefficient with which it appears in the formula for к(а а, )i 
uq We s. +) 18 


EK TM = — (es 
vilar! аа" о (ea aCe ме aad, 


Now p(a, ^a, . . .) will itself have an array of type (11.28) with column totals 
and row totals, say (b,^ b^ .. у, and similarly for p(a,%: ages ) 
E др д» p 
* 0р may be regarded as equivalent to [4 9". д} ; 

p axe + ap SENS azn » 10. to S, in the notation 
of 11.10. i 


, " 
(бз а, , 


Provided that 


PROOF OF THE VALIDITY OF THE RULES 277 


Ві. + f, = В, these arrays will correspond to terms in the «’s which, when multiplied, will 
give a term in («^ кух. . .). Thus the product of these terms may be considered as an 
array of type (11.28) with column totals (a,;* a," . . .) and row totals (6,2: bs...) and 
with the body of the table resolvable into two separate blocks. Since there are «, columns 
01 


of total a,, there will be ( a) (©) products of this type in the expression which gives 
[^ 05 


(01% a. ...). This fenton is the same as (11.93) but of opposite sign. Hence, if we 
ignore the separate two-part blocks in the array for и we shall have allowed for the products 
of two moments which must be subtracted from y to give к. 

Now some of these separate blocks will themselves be separable into two blocks, and 
in subtracting them all from u(a,™ a, . . .) we subtract too much. For example, if there 
are three separate blocks, L, M, N, we shall, by considering L and (M + №) as two blocks, 
have subtracted L, M, N. We shall have done the same by considering M and (L + N), 
апа W and (L + M) as two blocks. That is, we have subtracted 2L, 2M, 2N too much. 
We must restore these blocks to the array for и again. Such additions, summed over all 
blocks of three, will be found to equal the terms in the expansion of (11.27) which result 
from the product of three moments. 

In restoring these blocks we restore too many of the cases where there are four separate 
blocks. These must be subtracted again, and correspond to the negative term in (11.27) 
involving the product of four moments. Proceeding in this way we establish Rule 3. 


' 


11.20. Now we proceed to Rules 4,5 and 6, which are the fundamental rules of the 
whole process. Consider again the array of type (11.28) to fix the ideas, say, 
2 3 I 6 
LT 
1 


a 4 2]|10 . . . B . . (11.94) 


"This array will represent a number of terms in the operation, each of which consists of the 

Operation of 0, on a term z?.x?.x (the first row), 0, on 2.2 (the second row), and so on. 

Provided that the suffixes of е a's in any row are alike, every suffix of the 2’s will provide 

ПА, а term, for k, contains terms with every distribution of powers (adding to p) and suffixes. 
There will, for instance, be terms of the following kind :— 


аР 7 Е DICE 
wr oa ж ж ац, my Rt ү 
Bg 20 . 2, qd. 21. 21 
| Ug =" tg, Wa i Wa, ML D 


In fact, for any completed array, we have terms in which 
1 (i) all the z's have the same suffix (n in number, one for each suffix), 
(i) all the z?^s but one row have the same suffix (n(n — 1) in number), 
(iii) all the a^s but two rows have the same suffix and the remaining two are the same 
(n(n — 1) in number), 

These cases correspond to the various separations dealt with in Rule 5. 
Now in case (i) the term in any column arises from the term in a? in k, and (apart 
i m numerical factors which are considered presently) is n~1, from equation (11.16). 
` ence any column which contains an entry contributes a factor n^! and the total function 


and so оп, 


fro 


278: APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


of n arising from case (i) is the product of n and of (n~!) to the power of the number of 
columns containing a non-zero entry. 


Similarly in cases (ii) and (iii) the n-function for each separation is the product of: 


n(n — 1) and, for each column, a factor in n-! CERT according as the column contains 
non-zero entries in one or in both parts of the separation ; and so on. 

This explains the origin of the pattern function as described in Rule 6. But in order 
to establish that rule completely (and incidentally to establish Rules 4 and 5) we have to 
show that the numerical coefficients arising from each separation are the same. When 
this is done the validity of Rule 6 is demonstrated, for the separate contributions in » may 
be added together to give the pattern function and the whole multiplied by the numerical 
coefficient. 

0, may be considered as the operation of picking out an x from the operand in all 


possible ways and replacing it by unity. Similarly a may be regarded as picking out 


-p x's with the same suffix and replacing them by unity. It is thus evident that operating 


on a k product by a 0 product = of the same degree will yield a result which is 
SE 


the number of ways in which sets of 278 can be picked out of the I: product so that cach 


set contains b, of one suffix, b, of a second suffix (which may be the same as the first), and 
во on. 


Now consider the operation (11.92 
form (11.17). The operations 0 bein 
with a sum of terms comprising all t 


) in which the k’s are expressed in the simplified 
distributive, we shall emerge from the operation 
he possible ways in which the individual x’s can be 
picked out of the /: product such that the row and column totals of the two-way array are 
satisfied. Consider the sets corresponding to a particular array, such as (11.94). The 
contribution to the total will consist of the ways of picking out individuals such that 
(i) from the individuals in the first X, are chosen four in the partition (2, 1, 1), 
(ii) from the second k, are chosen four in the partition (3, 1), 
(iii) from the k, are chosen two in the partition (1, 1), 
(iv) these are associated in all possible ways such that individuals in a row arise from 
the same suffix, 
On consideration it will be seen tha. 
of ways of allocating the individual 
true whether sets of rows have the sa 
Rules 5 


the total number of ways of doing this is the number 


s from column totals as required by Rule 5 ; and this is 
me suffix or not. 
and 6, and hence Rule 4, follow at once, 


11.21. The remaining rules are ancillary. 
Rule 7 follows from Rule 2. In fact, the pattern function is 
composing the array, and the pattern with a row containing 
form the skeleton of an array in which that element is unity ; 
appearance of кү, which by Rule 2 is impossible. : 
Rule 8 follows from Rule 6. 
one separate of all the separatior i 
all multiplied by n~! ovine to its presence, | dai sai эша 
Rule 10 follows from Rule 8. The addition of a uni 


independent of the numbers 
one element can therefore 
and this would entail tho 


part is equivalent to the addition 


This multiplies all pattern functions by : 


— leaves 
n 


of an extra column containing unity, 


PROOF OF THE VALIDITY OF THE RULES 279 


numerical coefficients unchanged and increases the suffix of every « according to the row ` 
in which the unit appears. 


11.22. There only remains to prove Rule 9. Note that any pattern function can be 
evaluated linearly in terms of the functions of the pattern obtained by omitting one of 
the columns. For example, consider the right-hand column of 


<a es 
x . XBX 
AO ais . 
Xe LX 


and the contributions to the pattern function from it. The 15 separations which are 
possible with four rows can be divided into two classes, that in which the two rows 
in the fourth column lie in the same separate and that in which they do not. In separations 
of the first type the contributions from the first three columns will be the contributions 
of all separations of 


ie eee oc П) 


Mao S 
а + MA ОКОВ) 

x Х . 
in which the first two rows are amalgamated. Considering the function of the first three rows 

X UX 

o6 ae „Хх 
E M Mu DT o (11.97) 

хх 


in which amalgamation has not taken place, we see that the contribution consists of all 
contributions which do not occur in the first. Calling the first A and the second B, we see 
that the contribution is 

1 1 il 1 

ee: В-– А А B 

n n(n — n) ) n—1 n(n — 1)’ 


ie. a linear function of the derived patterns A and B. The proof of the general result 
follows exactly the same lines. 

Now if a pattern may be divided into two groups connected only by a single column 
we can reduce it step by step by omitting the other columns. We end up with this single 
column, and the pattern function of this column must vanish ; for the column total 
а corresponds to k, whose mean value the one-column array expresses, and since by definition 
this mean value is x, no composite terms such as would be given by two rows or more 
can appear, 


11.23. As an illustration of the way in which the sampling formulae can be used to 
approximate to a sampling distribution, let us consider the distribution of 4/5; in samples 
Írom a normal population. We have, in terms of the sample moments, 

ma n—2 ks 


КО: mj /n(n — 1) kÈ 
For a normal distribution the variance of ks, «(3*) is, by (11.36), equal to 
бтк 


(n — I — 2) 


280 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


We therefore consider the statistic 


n! ee kj TOR a OEE 
6n 
»—1 
же с ie "E суй c "ит 
vis —эў}/ unda: 


which will, to order ^ !, have unit variance. We have 
(n — 1)(n — 2) = LA — "t 
2 = ПЕ зла 


6n Ky 


SOS (11100) 


Ka 


Since the population is symmetrical the mean value of z is zero, 


(11.100), 
2 (% — 1)(n —2) 1 29 уор 6 ар _ а _ 10 = 
x б» am kz x (8. — к.) + xi lka Ka) zi (ka — кь) 


We then have, expanding 


„2 
3 


15 21 28 ,, 

ELA. — к) — xis — e) + TOS — e +. + (11.101) 
2 2 2 
The variance may be obtained by taking mean values of both sides, and since к, is the 
mean value of /, we have i 
(п — 1)(n — 2) 1 3 2 6 10 
= m H(82) — = (322) + © (3222) — 10 (osos 

var (x) (m 2 u(3?) z^ ) + arp ) е (3223) 

15 21 28 
api) — aes + Bacar +... 


. . - (11.102) 

We now express the product ws in terms of product x’s by using equation (11.27) 
and identifying coefficients. For a normal distribution «(32") = 0 and we will take our 
approximation to order n~4, so that кв of five parts or more may be neglected. We 
then find, 


a Б eee Ө") + S tear) + anon 


= 8000829) + 9. (992) (22) кзз) 1 


d (6«(3*2*)«(22) + авзо) ооз) 


+ к(3°)к(24) + 3«(32)3(22)) — 115 (322) (22 + 10к(3°)к(2з)ук(эг)у} 


28 
—.15к(32)к3(922 о 
P ао ] иа с - (11.103) 
Substituting the values of e uations (11.31) to (11, 
анор, q ( ) to (11.85) we find, after some purely algebraic 
уат х =] — - 6 28 0 


^. ' n* ma 3 E : + (11.104) 
In a similar way (for details, see E, S, Pearson, 1930) we find 


1056 | 241 
Bite) ва PEIUS т 


n3 Voi жга v $ 


- (11.105) 
Hs(v) is zero, for the distribution is symmetrical, 


THE MULTIVARIATE CASE 281 


Thus it appears that as n —> co the second moment of x tends to unity and the fourth 
moment to 3, which is in conformity with tendency to normality. But the tendency is 
by no means very rapid. When x = 100 the variance is approximately 0-942 and in 
assuming x to be distributed with unit variance we should commit an error of about 
6 per cent. | 


11.24. There are two ways of improving on the first approximation that x is normally 
distributed with unit variance. In the first place we may consider a transformation to 
а new variate £, chosen so that ё is normally distributed to order n-*, Secondly, 
we may fit a Pearson curve to the distribution of z, using the values of moments given 
by (11.104) and (11.105). The appropriate curve is the Type VII 


a cc (1+%) `4 mee oe T ш По 


The first line was adopted by Fisher (1928), who obtained the following transformation : 


з 91 3 NDS 33 = 
=4 a — eese s ap ae (из Еу 5 — 100° + 15x) . (11.107 

d 41 a T a) s x e 2) 8n? Lo : 5) ( ) 
The second was adopted by E. S. Pearson (1930), who tabulated the 1 per cent. and 5 per 
cent. significance points of (11.106), that is to say the values of the deviates x for various 


values of n such that 99 per cent. and 95 per cent. of the total frequency of the sampling ~ 
distribution falls within a range of + on each side of the mean. 


The Multivariate Case 

11.25. The foregoing results can be generalised to the multivariate case, and we 
give an outline of the extension to that of two variates. 

Given any bipartite number pp’ we shall have for any partition (pipi) (pspay* . . .} 
and the bivariate cumulant «,,, a k-statistic Æp whose mean value is xpp. Explicitly 


np (— Hp — 1) , (xm yiP xy limp y) 
b 1 ! Py = р P Я 
koa Т2 nl) (pil) (ply c. mal gee rs (0-108) 
In particular, corresponding to (11.22) we have 
1 ) 

ky = Po (2811 — $10801) 
hee 252 

eis (2782 — 22810 $11 — "820 Sor + 28% Sox) 

Л 1 
ky = ad {n2(n + 1), — n(n + l)sso So — 32(% — 1)S11 S20 

(11.109) 
— Sn(n + 1)Sa S10 + 6811 Sio + 6250 S10 S01 — 6 Sor sio} . . 5 
bam n В 2(n+1), Р 2(n--1). А (n —1) 
ы eoa” F1)S22 A 21 801 2 12 810 a S29 $02 


2(n — 1) , 8 2 2 2 2 6 " 
— ——— — 8i = 811 S19 S10 + — 82 Sig + — S20 Soi — — S% 82 
n 11 zm n 11 910 910 n n 01 n? 10 ?01 J 


282 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


In generalisation of the mean value functions of the k’s we may write, for example, 


Elke ku) = 16 j 


0 1 E 
2А О 
E(k E, kos) = “(0 1 z) 


with corresponding x’s. The latter may be expressed in terms of the cumulants of the 
bivariate distribution as in the univariate ; ients will now depend on partitions 


of bipartite numbers. Our rules still apply (and in particular the pattern functions 
but the numerical coefficients associated 
have to consider the number of ways of 


two-way partition of a bipartite number. 
An example will make the modification clear. 


Suppose we wish to find the coefficient of Keg, in к 2 He 


К 2 a The total degree is 
10 and, the orders of the product being 6, 2, 2, we have to consider arrays of type 
в. | 


2 
2 


4 4 2] 10 


The pattern functions are those 
е have to regard the column 
› 2), (2, 2), and (1, 1) and the ro 


i.e. those we discussed above. 

* For the numerical coefficients wi 
types of object in number (2 
For instance, the array 


we have already found. 
totals as consisting of the two 
w totals (3,3), (1, 1) and 1, 1) 


might be written either as 


| 
(51) (L1) (1,1) | (3,3) | 
OT Wd, iy) eg Nun N 
(6,0 (01. ET p | 


(2,2) (2,2) (1,1) (5, 5) | 
or as 


(2,0) (0,2) (1, 1) | (3, 3) 


(0,1) Q,0) . (NNNM SOLIS 


Tt will be found that, 
У permuting the first two columns, 


d array to ether j 
кү Ма NI op NIE. Y Er s 
liar ifa 1): = 16. 


=w 


THE MULTIVARIATE CASE 283 
That in (11.111) and the permuted array is 


(a) a 


. 1 
The total contribution is thus 20. The pattern function is =, 
(n — 1)(n — 2) 
In the same way it will be found that for the partitions 
а лаа Зас 
DW О LS S 
1 1 | 2 SNL 2 
4 4 2/10 4 4. 2] 10 
the coefficients are 48 and 8. Thus the desired coefficient of казкі 18 
20 üt cl 8 4(19n. — 33) 


(n — 1) (n — 2) T (n— 1)? ' (n—1)}? (n= 1)(m — 2y 
Example 11.6 


To find an exact expression for the covariance of the estimates of variance of two 
correlated variables, i.e. 
2 0 
ao 


This will clearly consist of three terms, in Koa Koz Koo and кї. For the first we have ~ 
the partition 


(2,0) (0, 2) 
(2,0) (0,2) | (2, 2) 


with pattern function = and numerical coefficient unity. For the second no contribution 


(2, 2) 


exists, the only arrangement being 


(2, 0) v | (2, 0) 
(0, 2) (0, 2) 
(2,0) (0,2) | (2, 2) 


Which has a vanishing pattern function. For the third term we have 
(1,0) (0,1) | (1, 1) 
(0,0) (01) (1) 


(2,0) (0, 2) | (2, 2) 


1 p: 
the pattern function for which is 2) and numerical coefficient 2, Hence 


fy ОЕТ. 
к = -Kə t ———кЇ. 
0 2) n n—1 


11.26. In conclusion it may be noted that the method of expectations may be used 
to derive sampling moments of the distribution of samples from a finite population. 'The 
algebra becomes much more complex because the sample values are no longer independent 


284 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 
and we cannot, for example, write E(Zuy2,5) = n(n — 1)u',u' Tschuprow (1923) and 
jtk 


Isserlis (1931) have investigated the subject systematically, 
the first four moments of the mean and the first four of t 
these had been obtained earlier by Tschuprow himself, Ne 
We quote the following formulae, in which 


Moments of the Mean: 
Emi) = и, ` dj TW E А А т X 1 . (11.112) 
N —n 
nN — 1) . . . б . є . . + (11.113) 
Em — p) = (N — n)(N — 2n) 


the latter giving formulae for 
he second moment. Some of 
yman (1925) and Church (1926). 
N represents the number in the population : 


Ет — pi)? = 


nV — iv — 3) . = B . Д б ‚ (11.114) 
E(m, — uj = N —n 


f. 72 DET, 2 
ni N VW = 3j O 6Nn +N + 6n а 


+ 3N(N —n — 1)(n — l)u$) . (11.115) 
Moments of the Sample Variance ; 


Bm) = C D PES ie en d cate ү. Ja 


(n — )N \2 N(N — n)(n — 1) f 
= = —n— № —1\(N — 
(m — E 0^] A = N —зух = зу” N - 9 — n, 
O NEON sa шс р  .Qlir h 
Church also gives the third and fourth moments of the sample variance, the formulae tak” Z 


limited, but it would be interesting to inquire how far 


the combinatorial method appro: #", 
to k-statistics can be extended to the case of the finite population, * р (е. 
NOTES AND REFERENCES ГА 
For earlier work on the expectations of moments f 


see Tschuprow (1919) a 
(1925). "Thiele (1903) seems to have been the first to appreciate the Possibilitie{nd Church 
other symmetric functions, but owing to the fact t D 


hat he defined hi è; of using 
д а Е S samples о : 
Invariants " to be the same function of the observations ag the parent semin Да 1 | 
(our cumulants) are of the parent values, his formul d ы 


ans ae - 0 ae remained com 
vestigations on similar lines were carried out by Cr, 


k-statistics were introduced by Fisher in 1928 


S. Pearson (1930), and Hsu and 


ther seminvariant Statistics 


n : Seminvari, 
of binary quantics. The reader who refers to Fisher’s basic Paper of na ie ds осту 
of misprints. Methods of deriving bivariate formulae fy ; ША beware 


* See Irwin and Kendall, Ann. Eug. Lond., 12 ivati 
e m per ug. › 12, 138, for a derivation of these formulae from thoso 


B 
^d 
1 


rs 


NOTES AND REFERENCES 285 


Church, A. E. В. (1925), “ On the moments of the distributions of squared standard devia- 
tions for samples of n drawn from an indefinitely large population,” Biometrika, 
17, 79. 


` —— (1926), * On the means and squared standard deviations of small samples from any 


population," Biometrika, 18, 321. 

Craig, C. C. (1929), “ An application of Thiele's seminvariants to the sampling problem,” 
Metron, 7, 3. 

Dressel, P. L. (1940), * Statistical seminvariants and their estimates with particular emphasis 
on their relation to algebraic invariants,” Ann. Math. Statist., 11, 33. 

Fisher, R. A. (1928), “ Moments and product-moments of sampling distributions," Proc. 
Lond. Math. Soc., 30, 199-238. 

— — (1930), “ The moments of the distribution for normal samples of measures of departure 

from normality," Proc. Roy. Soc. A, 130, 16. 
and J. Wishart (1931), “ The derivation of the pattern formulae of two-way partitions 
from those of simpler patterns," Proc. Lond. Math. Soc., 33, 195-208. 

Hsu, C. T., and Lawley, D. N. (1939), “ The derivation of the fifth and sixth moments of 
b, in samples from a normal population,” Biometrika, 31, 238. 

Isserlis, L, (1931), On the moment distributions of moments in the case of samples drawn 
from a limited universe,” Proc. Roy. Soc., 132 A, 586. 

Kendall, M. G. (1940a), “ Some properties of L-statistics," Ann. Eugen. Lond., 10, 106. 

—— (19400), “ Proof of Fisher's rules for ascertaining the sampling semi-invariants of 
k-statistics,” Ann. Eugen. Lond., 10, 215. 

—— (19400), “ The derivation of multivariate sampling formulae from univariate formulae 
by symbolic operation,” Ann. Eugen. Lond., 10, 392. 

— — (1942), “ On seminvariant statistics,” Ann. Eugen. Lond., 11, 300. 

Neyman, J. (1925), “ Contributions to the theory of small samples drawn from a finite 
population," Biometrika, 17, 472. 

Pearson, E. S. (1930), “ A further development of tests for normality," Biometrika, 22, 239. 

St. Georgeseu, N. (1932), ** Further contributions to the sampling problem," Biometrika, 
24, 65. 

Thiele, T. N. (1903), T'heory of Observations, London, C. and E. Layton (reprinted in Ann. 
Math. Statist., 2, 165). 

Tschuprow, A. A. (1919), ** On the mathematical expectation of the moments of frequency 
distributions,” Biometrika, 12, 140 and 185, and (1921) 13, 283. 

—— (1923), * On the mathematical expectation of moments of frequency distributions,” 

f Metron, 2, 461 and 646. 

Wishart, J. (1928), “A problem in combinatorial analysis giving the distribution of certain 
moment statistics,” Proc. Lond. Math. Soc., 29, 309. 

— — (1929р), “ The correlation between product moments of any order in samples from 
а normal population," Proc. Roy. Soc. Edin., 49, 78. 

— — (1930), “ The derivation of certain high-order sampling product-moments from à normal 
population," Biometrika, 22, 224. 

7— (1933), * A comparison of the semi-invariants of the distributions of moment and 
Semi-invariant estimates in samples from an infinite population," Biometrika, 
25, 52. 


APPROXIMATIONS TO SAMPLING DISTRIBUTIONS 


286 
EXERCISES 
11.1. Show that the pattern functions of the following patterns :— ae 
"e X x 056 x x | 
ar ES d ie OC 
S ON > SON  - 
are 1 and els Me respectively. 
(n — 1)? У n(n — 1)? 
(Fisher, 1928.) k 
* 
11.2. Show that the pattern function of the pattern * 
oS X ЗА 
е САГУ эс. 
x А 1 (— 1)7- n. 
with p-columns is 100 ME т 
P = i} (Fisher, 1928.) j | 


11.8. Verify the formule of equations (11.33) and (11.39). 


11.4. Show that the generating function of the moments of the k-statistics, (e бе ) 
1, t \ 
Eius... e Dg узат 5 a) 
л. 


is given by 
[ew {t,K, + tKa + tKa +. . .} exp {rss + a n. 1] 
Ы r-0 


: : 0 В 
where K, is the same function of the operators 2; 5 i, is of the observations 2 and s, = X(a*.) 


Deduce that 


Ky (sp) = p! 
і Eje, + + -) =0 
where (рур...) is any partition of p. (Fisher, 1928.) | 


Note that if M(z) is the moment-generating function of x, the mean value of 


£ = f(z) will be [ ie 2) |, and that of é by [{/ e j| x] . 


Hence that the generating c of the moments of £ may p written 


м) = [exe {u(%)} ug] . 


11.5. Show that 


TAE, (es T h, 22, 2,)) = Кр + —hP 
and hence that 
Sk, = p! 
Sky =0 qp 


where S, is the s i 2 i 
, ame function of the operators Эл 28 Sp ÍS of the observations v, 


(Kendall, 1940a.) 


EXERCISES 287. 


11.6. Show that the generating function of the moments of the -statistics is given by 


СЕ + кз а expeti т... 3] i 


1! eo 


and hence derive the result of the previous exercise. 

н . (Kendall, 1942.) 
11.7. Use Exercise 11.5 to show that, in the expression of b, in terms of the sym- 
к i A 

metric sums s, the sum of the coefficients is x 
Show similarly that if 


S, = Ask, + Aui ky d... Any, «eae Dn Er tss ky. Бо 
then 


(Kendall, 19400.) 


11.8. Referring to the result of 11.22, show that for a normal parent population the 


times 


1 
(n — 1) 
those of the original. Show also that the effect on the numerical coefficient in an array 
is to multiply by twice the number of rows in the array. Deduce that the effect of adding 
a new part 2 is equivalent to operating by 


2к$ d 


effect of adding a new part 2 to к(а... а) is to give pattern functions 


(Fisher and Wishart, 1930.) 


11.9. Use the previous exercise to establish equations (11.67) and (11.68). 


11.10. In generalisation of Exercise 11.8 show that for a multivariate normal parent 
the effect of adding a covariance Б, (р, q referring to the pth and gth variates) is equivalent 
to operating by i d 
>> ma (кав + Kosa) qe? 


rs 


wh i іа of the variates p and q. 
ere kpg is the covariance gi d INN 


Я tains a column with three entries; if the patterns obtained 

y erecta, P ae ee (1) amalgamating the three rows, (2) amalgamating the 

Pairs of three rows, and (3) leaving the rows owe opo nee A, Bı, Bz, В, and C respec- 

tively, show that the pattern function for the original pattern is 

2 

pd а er 
(n — 1)(n — 2) (n — 1)n — 2) 


i | 
288 APPROXIMATIONS TO SAMPLING DISTRIBUTIONS | 
Deduce that the function of the pattern 


3 


X OX ye 
2S OKT РУ | 
ОР * e | 
х х } 
is n? — 8n? + Yn 4- 2 


(n — Y (n — 2) — 3y 
(Fisher and Wishart, 1930.) 
11.12. Show that | 


A ut m1 Кеа B 1 
T. art ques E деу 
2 0 ii " 
and «(0 2) = сказ $ ai 


Ki k 
See cal SS с шй 
V (20K 02) V (Keokon) 
then to order n-~! for normal samples 


and hence that if p= 


1 212 
axem CI р?)?. 


11.13. Show that for a bivariate normal population К, and Ty have zero covariance 
unless £ + u = v + w. 
(Wishart, 1929р.) 


11.14. Show that for a bivariate normal population 


1 X; pat, | al 
ерис ee crim nad ш 
er 2z0,0,(l — p?) хр{ (S 0105 2 оў 
tu ,. 
t t — 1)! Ajotev 
var (ku) = H(t J zu 2,8 ji mg 7: 0: P( —1, —w, 1, pt), 


j=1 
where A/0* is the jth difference of the kth power of zero and F refers to the hypergeometric 
function. 

(Wishart, 1929р.) 


11.15. Use the methods of this chapter to verify that to order n~t 


1 А ы | 
var (та) => (Me — Hi + 1бизи„ — 8дд). ] 


11.16. Referring to Exercise 11.4, show that the moment-generating function 


M'(t2, Ts, . . .) of the statistics kp, bs ka a Ser л ts | 
kt ki kj. 


is given symbolically by | 


‚ К, 
M'(vs, Tassa) = oxp raK, tug dak ja, hes si 
E 


UN 


EXERCISES 289 
where M(ts, f . . -) is the moment-generating function of the k-statistics and K, is the same 


function of ai 5 Г. is of x. 


Noting that in normal samples the distribution of Г, is independent of that of the other 


2kot, \—#—-1) 
= 3) show that 
» —1 


statistics and that its moment-generating function is (1 — 


Qty \—їп—1) 
M'(vs та...) = exp apt tht, 36 = =” MMC л г 


Hence, if the number r refers to /„, that 


5e453a) — 5c453a2—i d E 2к \ -0 

pl: « « 50089) = п... 5048342 pl 1 

where 9 =а +0 +с+... 
ж.= Їй ч Ж; чо 
and hence that u(. . . 504939) = u(. . . 504532370" Din ЕЛ) P" M =] ed. 
Deduce that 
vas k 6n(n — 1) 
(OW (n — 2)(n + 1)(m + 3) 
AN 108n*(n — l)*(n?* + 27n — 70) 
m keg} (в — 2)*(n + 1)(n + 3)(% F 5)(n + T)(n + 9) 


Hence verify the formulae of equations (11.104) and (11.105). (This remarkable result 
is due to Fisher (1930). The independence of X, and the other statistics may be seen by 
considering the n-fold sample space, k, appearing as the square of a length and the others 
as angles (cf. Geary (1933), Biometrika, 25, 184).) 


11.17. Defining y by the relation 
mes JE — 1)(% — 2)(n — 3) b. 
24n(n + 1) ki 
Show that the moments of the distribution of y in samples from a normal population are 


fa =0 
2 
jt 63 ade {88 532 


n n? ms 


by im NE 1 65 | 4811 136,605 
n 2n ' Sn? UGE oce 


E 4. $88 . 32,196 _ 1,118,388 
n n? n3 


(E. S. Pearson, 1930, by the method of 11.23, before the exact results of the previous 
exercise had been given. He fitted a Pearson Type IV to the distribution by using these 
moments, and tabulated the 1 per cent. and 5 per cent. significance points.) ; * 

11.18. If K,is the rth k-statistic in a population of V members and Z 
mean value of a sample of n in that population, show that E x(k.) = K 
(11.113) and (11.116). Show further that if the population is s n 
with any k-statistic of even order. 

A.S.—VOL. I. 


n refers to the 
^ Deduce equations 
ymmetrical, %, is uncorrelated 


Uv 


Ф 


CHAPTER 12 
THE y?-DISTRIBUTION 


12.1. Among the sampling distributions of current 
distribution is perhaps of widest application in virtue of t 
to normality in large samples, irrespective of the nature of 
is one other distribution, closely related to the normal, which 
applicability and we give an account of it in this chapter. 

Suppose we have a number of compartments or cells determ 
of a variate-value or by some qualitative character, such 
frequency-distribution, the cells of a bivariate distribut 
individuals into two classes, А and not-A. Suppose 
sampling from a parent population and that in the 
the jth cell is z;. In a sample of n there will occur 


the observed numbers being accordingly тр. If the sampling were such as to give an 
exact representation of the parent these numbers would be плу. Our fundamental problem 
is to determine how far the np; can, to any acceptable degree of probability, diverge from 
the nz; by random sampling fluctuations. We shall then be able to test the accuracy 
of the hypothesis on which the z’s were determined. 

A few examples from material occurring in earlier cha; 
In Table 5.1 on page 117 were given the actual occurrences of throws of dice, n being 
26,306, the cells being eleven in number according to the number of “ successes,” and a 
third column showing the theoretical frequencies based on the hypothesis that the sam- 
pling obeyed a binomial law. The observed frequencies are our np’s and the theoretical 
frequencies our nz’s. The question is, are the differences between the two such as can 
have arisen by sampling fluctuations alone? Tf not, then we must reject our hypothesis 
as to the generation of these dice-throws according to the binomial ] 

Again, in the table of Example 8.6 wer 
cholera. Тһе question which interests us he 
or prevent attack. If it does not, we expect 


statistical theory the normal 
he tendency of many statistics 
the parent population. "There 
has a somewhat similar general 


ined by specified ranges 
as the intervals of a univariate 
ion, or a simple classification of 
these cells are filled by random 
parent the proportion of members in 
proportions of, say, P; in the jth cell, 


pters will illustrate the problem, 


ation against 
re is whether inoculation does in fact restrict 


é » €.£. the proportion of attacked in the former 
would be 34%; x 279 = 23-5 approximately as against an observed 3. The former number 


is an nz and the latter an np. Once again we should examine the differences, and if they 
were large enough to be inexplicable on the basis of sampling alone, should reject the hypo- 


thesis of independence of inoculation and attack, concluding that inoculation was to some 
extent preventive. 


12.2. Consider then samples of » with a division of the possible classification into 
p cells ; and suppose the members distributed simply at randoni in these cells. 

Then the probability of there being /, members in the first cell, 7, in the Second, and 
so forth, is the term in mb л," .. . in the multinomial form 
(m dm o... Ty)”, 
that is to say, is 

т! 
Мы... S ES ss eu a 
290 


VI Cor A NE p^ 


M 


THE j*DISTRIBUTION К 291 


If the Ёз are not small we have, in virtue of Stirling’s approximation to the ractorial, 
netre na 2m 
D zi b 
CU [itt eka)... Пел) 


and since n = Jl this becomes 


EA _. "ey Р 1 7 = (12.3) 
l l, 


Now put а= ту 


T-mh.. од) 


апа B5-———- Tu 
Then from (12.3) we have 


log T' — log (constant) — zia + 3) log 3 
| i j 


= (+ 3) log 


2 
5 + VAS 
E 
—— X EVE (1 4s x) 


VA 
If 2 is large, £ will be small compared with 4 and to first order we have, expanding the ` 
logarithm, 
E? 
log T — log (constant) = — X(à +} + ea. È 3x) 
= — E(* + EVA + 0073) .... . (12.4) 
Now 


504/4) = РЭА 0) =n-n=0 
and hence, to order 2-3, 


log T — log constant = — 1222, 


Т œ exp(— àX£?). < (12.5) 


Squares of p normal variates of unit 
— 0 but otherwise independent. 


Hence the frequency T varies as that of the sum of 
variance which are subject to the constraint Z(E) 
Now put 


y = Хр? = MG 


ps в e АО, 


is that of the sum of Squares of (р — 1) independent normal variates 
Its distribution is then, from Example 10.4, given by 


Then the frequency of у? 
of unit variance, 


1 ОИЕ М 
dF = FI E ) emt (y2)Ke—3) d(z2) 3 5 . (12.7) 
t Nox 


- 1 
Exe Mus 
2-3/2 — 1 
2 


292 THE z?-DISTRIBUTION 


Furthermore, if there are certain constraints on the cell frequencies expressible b 
к linear equations among them, the distribution remains of the same form but is now ` 


1} е 

{к= ли бе сү. 
А 2916-2) r(? pu dz, * Y D • (12.9) 

i 3 p 


where v, known as the number of degrees of freedom, is p — к, that is the number of cells 
whose frequencies can be assigned without restriction. (Cf. Example 10.4.) 


12.3. The distribution (12.9) is usually known as the Z-distribution, though it is 
actually that of y. However, 7? and not у is the quantity which always occurs in practical \ 
calculations and most tables of the distribution function have 7? as the argument. (12.9) 
is only an approximation, and relies on the fact that for large 2 the Stirling approximation 
to the factorial will hold and that deviations from theoretical 7’s are negligible to order n~t | 
In point of fact, the approximation is very good and the 7?-distribution may confidentl 
be applied when the theoretical cell frequencies are, say, not less than 20. T 

Before dealing with the applications of the above results we will consider in more 
detail the properties of the distribution. g 


E vá 
Properties of the %°-Distribution 
12.4. Writing 7? = £ we have for the distribution 
1 
dF = ек eb] qe, Ое 
з®Г(® А aw S + (12.10) 
2 
a Pearson Type III distribution. The characteristic function is | 
1 
== a : 
П . . 9 
(0 — ain (12.11) 


whence, for the cumulants, we have 


Kp =v -ir — 1)! 


and for moments about the mean | + (12.12) A 
Ha = 2» 4 A 
fs = 8» | 
Ша = 48» + 12»? \ А | 
us = 82v (5v + 12) | : + (12.13) 


Шо = 40v(3»? + 52v + 96)! 


Since к, is linear in v, џ,, which can contain only [5] DOWers 
ЗР? р of 
LIT 3 |E H, Must be of degree 


2 


As v tends to infinity the 7?-distribution tends to normality 
we have ў 


r |. rout um е о 
[s] in >, ie. 5 if r is even and if r is odd. 
2 


for in Standard measure 


ms -25 _ _2й \-2 
plt) =e e Б” 


PROPERTIES OF THE ;7-DISTRIBUTION ' 293 


kc 40 Fe) siye ven) 7] 


— — И?, 


The tendency is, however, rather slow, and there are better approximations as we shall see 
in a moment. 


12.5. 'The frequency curves given by (12.9) extend from zero to infinity. In the 
case » = 1 the curve is merely the positive half of the normal curve. In other cases it is 
zero at the origin, rises to a mode and then falls off again to infinity. The maximum ordinate 
of the y-distribution (not the z?-distribution) is given by 

d 
" (клу = 0, 
2,4 r 
namely, by 
z? =v— l, . 2 < А с + (12.14) 
and that of the 4*-distribution or ¢-distribution by 


Ser er) =0, 


namely, by 


= y= 2. : 5 : $ + (12.15) 
The skewness of the £-distribution in the form (mean —mode)/(standard deviation) is then 


Qu = 2 
eee. ry 
12.6. The distribution function of (12.10) is an incomplete I-function. We have 


с > 
FO= | engi gr 
20 


2 
= Dy»), 
or, in the notation of Pearson’s tables, 


=I (sap P) o cul x LH 
Some special tables have, however, been constructed. 
(a) Elderton's table (Tables for Statisticians and Biometricians, Part Т) 
values of P — | ‚ФЕ = 1 — F(t) = 1 — F(z?) for values of» = 2 (1) 29 and y? = 1 (1) 30; 
30(10)70. In itis table, which is to six places, our » 


is denoted by n’ — 1, 
(0) A table by Yule, reproduced at the end of this volume, supplements Elderton’s 
У giving P for v = 1, X^ = 0(0.01)) ; 1(0-1) 10. 


gives the 


(c) Kelley (1938) gives a four-place table of P for x from 0 (0-1) 4-1and »=1 (1) 10; 
12, 15, 19, 30. ” 


(2) Fisher and Yates (1938) give tables in an inverted form, showing the values of 


294 s THE 7?-DISTRIBUTION 


7° for certain values of P and v, namely P = 0-99, 0-98, 0-95, 0-90 (0-10) 0-10, 0-05, 0-02, 
0-01 and 0-001; and v = 1 (1) 30. 

(e) Thompson (Biometrika, 32 (1941), 187) gives tables in the inverted form for 
P — 0-995, 0-990, 0-975, 0-950, 0-750, 0-500, 0-250, 0-100, 0-050, 0-025, 0-010, 0-005 and 
> = 1 (1) 30 (10) 100. 

For general use the incomplete I'-function tables are probably the best, as interpolation 
in Elderton's table does not give very great accuracy. The significance points tabled by 
Fisher and Yates are, however, sufficient for many practical applications of the 7?-dis- 
tribution in carrying out statistical tests. We reproduce at the end of the volume a diagram 
which will serve for such purposes. It shows, for co-ordinates > and 4?, the curves 
P — constant, so that for given » and 7? it is easy to determine whether P falls between 
any of the values for which the curves are drawn. 


12.7. Except for Thompson's table, the tables do not cover the region for which 
v > 30, and for such values an approximation to the normal distribution may be em- 
ployed. There are two such in common use :— 

(a) (Fisher) that 4/(27?) is normally distributed about mean A (2» — 1) with unit 
variance ; < 


PLE 2 
(b) (Wilson and Hilferty (1931) ) that (©) is normally distributed about mean 1 — o» 


with variance oy The second is more accurate, but involves more arithmetical work in 
у 


applications. 
The relative speed of approach to normality of 7? and Vv (2z2) may be compared as 
follows :— 
For z? we have, from (12.13), 
эе, 8r 8 
-n = V i јс D ge LS 209) 
13 
в оъ” Шу) 
For the moments of у we have 
b= 1 Tis -E "IL q 
T г) б еу ах 
2 
0, vr 
= 23 2 
s Н g : ° e + (12.19) 
v 
Т\в 


Тһиз 


Using the expansion 


log P(e + 1) ~ 4 log (22) + (s + }) log z —z + 


1 
12x M 36028 + ete. 


PROPERTIES OF THE ;7?*-DISTRIBUTION 295 


an extended form of Stirling’s formula, we find after substitution and reduction 


; il 1 5 
iuo (LT gt +...) 


128»? 


1 з 1€ TP 
whence щи? бов E А = EP ed 
Also Ш = > 

Hs = P + Шй 

ua = (v +2) 
whence we find for moments about the mean 


1 1 
Briggs » 
; 1 
Ba = aay 
3 
к= ape 5: t5 
Hence for the constants of (27?) we have 
H 
oe t 
Ki Ay 
а 
Осо 
ya = 000-2)... р Е ede Luo (HI) 


у ag with (12.17) and (12.18) shows that 4/(27?) tends to normality with con- 
siderably greater rapidity than 72. Moreover, the expression for jz, of y is equal to /(v — 3) 


to order »~? and hence 4/(27?) is distributed about mean /(2v — 1) to that order, with 
variance which is unity to order »-!, ; 


12.8. For the Wilson-Hilferty approximation, consider the distribution of у? about 
А " иҗ! у PI | 
à mean value >. Let us find the distribution of (©) = у, say, h as yet being undeter- 
mined. Write £ = y? — v. Then 


ym ( +2) 
9^ 


ENSE 
T 


35 } ete. 


Г $2 yt 
Wakiny mean valus anit using the results of A find, after some reduction, 
x 1) palz’) + ete. 
M —1) 1 РЕ h(k — 1)(Ь — 2)(3h — 1) 

1 v б>? 
eh- me ee) a 00-5. (рро) 


ux) Те 


v 


\ 


296 THE 7?-DISTRIBUTION 

If in this we put r} = h we find the mean value of y' and thus 

rh(rh — 1)1 
1 » 

We now choose Ё so that the third term in (12.22) vanishes, ie. we take h = i. We 

then find 


gy) = 1+ cte e 6 . (12.93) ШШ 


‚ Hn x 
а la cR б ) 
: 2P x X 56 es 
Hay) =1— = + суз 06, + 007) 
из(0) = 1 
4 4 80 A 
pay) = 1-4 9» 32 n 353 + 0(»71) ; 
or, transferring to the mean, 


2 104 3 À 
ny) = Qv 373 aae) | | 
32 =á 12.24 
uly) = 35,8 + 00) t . . - (12.24) 
4 16. Е7 
Bay) = 33,3 — 35 + O(v 1) | і 
| 
[ 
We now find " | 
feug s : - (12.25) | 
deem 5 з « al s DAD 


Comparison with (12.17), (12.18), (12.20) and (12.21) 
as measured by y,, 


from (12.24), 


y 
shows that (©) tends to symmetry, 
more rapidly than either X* ог y/(2%?). To order »-? the variance is, tg 
2 
equal to à» and the mean 1 — ov 


The result may also be expressed by 


uu & Б "i KEY SES ча me 0020 


about zero mean with unit variance. 


saying that 


is distributed normally 


The following table, 


quoted from Garwood 
approximations with the 


actual values. In each 
of y? corresponding to P — 


(1936), shows some comp 
case there have been wor 


| dF of 0-01, 0-05, 0:95 and 0-99, 


arisons of the two | 
ked out the values 

The exact values are 
denoted by m,, those given b 


z 
y Fisher's approximation my and those b 
approximation by жуу. 


y Wilson and Hilferty’s 


EXAMPLES OF THE USE OF THE Z*-DISTRIBUTION 297 


TABLE 12.1 
P Comparison of Approximations to the 7? Integral. 
e (From Garwood, Biometrika, 28, 437.) 
[ v mg mp Mp — Mp ту тт — My 

11-082 10-764 0-318 11-070 0-012 
| o 18-742 18-414 0-328 18-732 0-010 
| Pim 0-00 80 26-770 26-436 0-334 26-761 0-009 
| 100 35-032 34-694 0-338 35-0925 0-007 

| | | | 
40 13-255 13-116 0-139 13-254 0-001 
60 21-594 21-455 0-139 21-594 0-000 
Жш... 80 30.196 | 30-056 0-140 30-196 0-000 
100 38-965 38-825 0-140 38-965 0-000 
42 33-103 32-700 0-403 33413 | — 0.010 
P 62 45-401 45-003 0-398 45-409 | — 0.008 
= 0101 82 57.947 56-053 0-394 57-355 | — 0-008 
102 69-007 | 68676 0-391 69-074 | — 0.007 
| 42 29-062 28-919 0-143 29-060 0-002 
Pss 62 40-691 | 40-548 0-143 40-689 0-002 
zs 82 | 52.069 51-926 0-143 52-068 0-001 
| 102 63-287 63-144 0143. | 63286 0-001 

{ Ш 


\ 
The жу approximation is evidently very good and the ть approximation is fair. 


12.9. A. third method of approximation may be obtained by the method of 
6.32 and 6.33, and in fact our Example 6.5 was virtually based on the x?-distribu- 


А ч 12 484/ 
tion. From equation (0.75) with 1, = 1, = 0, l = кз = E li = к. = 3 l; = = 2 


, 


480 
l = —., we find from (6.73) 
v 


or (6.75) a normal deviate whose distribution function approxi- 
mates to that of y2, 


Examples of the Use of the 3^ Distribution 


12.10. We now proceed to consider 


some examples of the use of the 4°-distribution 
omparing observation and hypothesis. 


in с 
Example 12.1 


| In Table 12.2 we repeat some of the material of T: 
first hands at whist according to the numb 
The theoretical figures 25а 


able 5.4, giving the distribution of 
er of trumps held. The observed frequencies 
re based on the hypothesis that the distribution 


The last column shows the contributions pan to 


follows the hypergeometric series. 
the tota] y2 


To avoid small frequencies we have grouped together those frequencies 
ОЁ 7 or over. 


—u а Е.Ы 


298 THE 7?-DISTRIBUTION 


TABLE 12.2 


qs i Thi "di Y held of a given Suit. 
i First Hands at Whist according to Number of Cards 
eer cc - Hands Dealt but not Played. 


! | Я 

! Obs d Е; ency. Theoretical Frequency. Q-- 4)? 
Nombor ОБ Cards: Observe peg У. | А а = 
0 35 43-5 1-661 
1 290 272-2 1-164 
2 696 700-0 0-023 
3 937 973-5 1-369 
4 851 811.3 1-943 
5 444 424-0 0-943 
СЕНА еъ 5 115 141:3 4-895 
папа over). s 32 34.2 0-142 

| 

РЕ 
Пот 3400 3400-0 12-140 


The total 7? is seen to be 12-140. The number of de 
than the number of cells (excluding the total), namely 7. 
it is seen that the probability of getting a value as gr 


sampling (- | dF = p) lies between 0-1 and 0-05, ve: 
E 


are therefore about 9 to 1 against gettin 
diverge to a greater extent from theoreti 


From the diagram in the appendix 
eat as this or greater on random 


ry close to the former. The odds 


TABLE 12.3 
Distribution of First Hands at Whist according to Number of C 


ards held of a given Suit. 
Hands actually Played. 


Number of Cards, | Observed Теа. Theoretical Frequency. g = a | 
À | A 
| | 
0 215 320 
1 1,724 2,002 
2 5,202 5,147 
3 7,440 7,158 
4 6,971 5,965 
5 117 
2 2,950 
3,117 
6 852 1,039 
UU CASU: nis 166 Б 
8 апа over . , 20 31 
TOTAL pow 1 25 | 
| 5,000 24,999 174-130 
\ 


grees of freedom v is one fewer, 


І 


EXAMPLES OF THE USE OF THE ,2-DISTRIBUTION 299 


2 


апа > = 8 (one more than in the previous example because we have grouped only those 
frequencies of 8 and over). From the diagram it is clear that the chance of getting such 
a value or a greater one is exceedingly small, certainly less than 1 in 10,000. This very 
rare event leads us to reject the hypothesis that the hypergeometric distribution is operating. 
The explanation is probably that these deals were taken from actual play, whereas those 
of the previous table were obtained without actually playing the hands. It is evident 
that in card play certain kinds of card (e.g. those of the same suit) tend to collect together 
and the shuffling is apt to be somewhat perfunctory. Thus the condition of realisation of 
the hypergeometric distribution, that the selection is random, was probably violated. 


Buample 12.2 


In some classical experiments on pea-breeding Mendel obtained the following frequencies 
for different kinds of seeds in crosses from plants with round yellow seeds and wrinkled 
green seeds :— 


Observed Theoretical 
Round and yellow . A Б s C = slp 312-75 
Wrinkled and yellow : E " J s 101 104.25 
Round and gren . А i 5 : . 108 104.25 
Wrinkled and greon . i 7 5 589 34-75 
Toran a . 556 556-00 


On the Mendelian theory of inheritance the frequencies should be in proportion 9, 3, 3, 1 


and the theoretical frequencies are shown in the last column. 
We find 


.. (225)? | ($35) | (8-75), (2-75)? 
312-75 ' 104-25 104-25 34-75 
The number of degrees of freedom r = 3. The probability of obtaining the value of yi 


or greater is seen to be between 0-9 and 0-95. There is thus nothing in the value of х to 
lead us to reject the Mendelian hypothesis. 


x 


0-4700. 


12.11. Consider now a table of the type of Table 12.4, which shows the frequencies 
of a number of men according to eye colour and hair colour. If, on some hypothesis as 
to the relationship between eye and hair colour we determine theoretical frequencies in 
the body of the table, leaving the row and border columns unchanged, then there are 
à number of linear constraints on these frequencies. 

In fact, if in such a table there are r rows and s columns it will be found that only 
(r — 1) — 1) cells can be filled up arbitrarily. "There are rs cells altogether ; but the fact 
that the rows and columns must add to assigned totals imposes 7 + s constraints. These, 
ers are not independent, for the sum of the border column frequencies is equal to 
hat of the border row frequencies and thus there are only 7 + s — 1 independent linear 


exnstraints, Hence rs — (r + s — 1) = (r — 1)(s — 1) cells are independent and this is v, 
Ө number of degrees of freedom associated with such a table. 


Example 12.3 
In Table 12,4 suppose that eye and hair colour are independent. Then the expected 


fre ; . 2 
quency in any cell with a row total x and a column total y will be w where т is the 
, т 


300 THE 7?-DISTRIBUTION 
TABLE 12.4 


Distribution of 6800 Males according to Colour of Bye and Hair. 
(Ammon, Zur Anthropologie der Badener.) 


Hair colour 


| 
Fair. Brown. Black. Red. TOTALS. 
| 
Воо ЖЕ 1768 807 | 189 47 2811 
ч : 
те геу or Green  . 946 1387 746 53 3132 
о 
ge 
El Brown. ... 115 438 288 16 857 
Torars | 9899 2632 1223 116 6800 


fewer... 
totalfrequency. For instance, the expected number of men with fair hair and blue X 


dix 
2 = 1169. The theoretical frequencies obtained in this way are— | 
X 
Fair Brown Black Red 
Blue . . . А - 1169 1088 506 48-0 
Grey or Green . : . 1303 1212 563 53-4 
Brown . . . . 357 332 154 14-6 
(1768-1169)? 
ence plur n UE. 
Heng : 1169 
= 1075-2. 


y 


ll 


(4 — 1)(3 — 1) =6, 
The value of 7? is very improbable, P being less than 0-000,001. 


We accordingly reject 
the hypothesis of independence and conclude that h 


air colour and eye colour are associated. 
12.12. Tt is useful to note that у? may be put into a form which is Sometimes more 
convenient for calculation, We have 


]— А)? 
74 = xt 
4 1 
12 
= Т = 221 + > 
пев. ? A s А . (12.28) 
When the 2’s are not integers it is easier to work with this formula, the squaring of the 


larger numbers | involving less arithmetic than the Squaring of the smaller but non-integral 
numbers J — 4, 


12.13. Inthe foregoing examples the theoretical frequencies 4 were caleulated without 
reference to the experimental data other than totals which merely subjected them to linear 


E. _ 


EXAMPLES OF THE USE OF THE 7-DISTRIBUTION 301 


constraints and hence preserved the Type III distribution of 72. There also arises the 
much more difficult case in which certain parameters necessary for the determination of 
the theoretical frequencies have to be determined from the data themselves. Suppose, 
for example, we attempt to represent a frequency-distribution by a normal curve. We 
have then to decide on the mean and variance of this curve, and they can, as a rule, only 
be estimated from the data themselves. The question then arises, what happens to the 
z°-distribution if, instead of the unknown parameters leading to the theoretical frequencies À, 
we use estimates leading to the estimated theoretical frequencies, say, 2? That is to say, 
how does the distribution of the statistic 


2 yl 1)? 
е 4 
compare with that of = j 
om = „ъ= Lassus 1529) 


This problem has not yet been completely solved. The nearest approach to a solution 
has been reached by R. A. Fisher (1924), who showed that 71 is distributed in the Type III 
form, provided that 

(a) the sample is large and that each cell-frequency is large ; 

(b) that the number of degrees of freedom is reduced by unity for every parameter 
estimated ; . 

(c) that the principle of estimation involved is such as to minimise g^. This is 
equivalent, for large samples, to taking a maximum likelihood estimate. 

Departing from our usual practice, we shall have at this stage to state this result 
without proof. It cannot be adequately discussed until we have dealt with the principles 
of estimation in the second volume. : 
Example 12.4 


The following table shows the distribution of 19 dice thrown 26,306 times, a 5 or 6 being 
reckoned a success. We have encountered these data before in Table 5.1. 


TABLE 12.5 
Distribution of 12 Dice thrown 26,306 times, a 5 or 6 being reckoned a Success. 


Number of Observed Frequency of Frequency of 
Successes. Frequency. | 26,306(8 + 2)12. 26,306(0-6623 + 0-3377)12. 
5 ж 
| 
9 ar ou B MA. 185 203 187 
1 VENUES c 06 1,149 1,217 1,146 
2 r af 3,205 3,345 3,215 
3 ` % 5,475 5,576 5,465 
" à 6,114 6,273 6,269 
5 s 5,194 5,018 5,115 
6 х 3,067 | 2,927 3,043 
7 5 1,331 : 1,254 1,330 
8 : 403 | 392 424 
ые. y 105 | 87 96 
10 and over 18 | 14 | 16 
П | 
I 
TOTALS 26,306 26,306 | 26,306 
—= | i 


302 THE 7?-DISTRIBUTION 


The third column shows the frequencies if the dice were perfect, that is the frequencies 
of the binomial law 26,306 ( + 3)!?. We find, in the usual way, 

(203 — 185)? ` 

203 
P is very small, less than 0-000,1 and we conclude that the hypothesis is to be rejected, 
ie. that the dice were not perfect or that something was wrong with the sampling. In 
this particular case great care was in fact taken in rolling the dice and the balance lies in 
favour of rejecting the hypothesis that they were entirely unbiassed. 

Let us then reconsider the data. If the dice are biassed, what is the true probability 
of getting a 5 or 6? This we must estimate from the data and it has already been seen 
that a maximum likelihood estimate is the mean number of successes in the sample itself. 
This is found to be 0-3377 and the last column in Table 12.5 shows the frequencies 
26,306(0-6623 + 0-3377)?. The agreement with observation is evidently closer and we 
find now у? = 8:201. > 1з now 9, for we have estimated one parameter. P is now about 
0-5, so that the observed frequencies are in quite good accord with theory. 


12.14. Since 7? is the sum of the squares of a certain number of independent normal 
variates each with zero mean and unit variance, a number of different values of х? may 
be added together and will be distributed in the Type III form with a number of degrees 
of freedom equal to the sum of the individual numbers. "This result enables us to combine 
the results of a set of experiments so as to determine the probability of the whole set; taken 
together. For example, Table 12.6 shows the data for inoculation against cholera on a 
certain tea estate. 


2 


+ etc. = 35-941 n= lo 


TABLE 12.6 
Inoculation against Cholera on a certain Tea Estate. 
Attacked. ! Not Attacked. TOTALS. 
| | 
| | Pape quo AC 
Inoculated . . . . - | 431 | D 436 
Not inoculated . . . .| 291 | 9 300 
| 
TOTALS 722 l4 136 
| 


We find 7? = 3-27, v = 1 and P, from Appendix Table 7, about 0:071. This is 
small, but not small enough to reject the hypothesis, particularly when we note that 
the theoretical frequencies in the not-attacked column are far from large, 

The results for six such estates were :— 


ya P 2) 
3:97 0-071 1 
9:34 0-0022 1 
6-08 0-014 1 
2-51 0-11 1 
5-61 0-018 1 
1-59 0-21 1 
28-40 6 


Here only one value of P is less than 0-01, and we might be inclined to doubt the reality 


\ 
"4 
| 
| 
| 


em 


ау 


THE 2 х2 BIVARIATE TABLE 303 


: Д ie o d 
of association between inoculation and immunity. The sum of 7? is, however, ees 
v = б, for which values we find P < 0-000,1; so that together the results are significant. 


The 2 x 2 Bivariate Table 


.15. We now return to a point which has been mentioned incidentally. If the 
ipm ieu in cells are peus the Type IH distribution will за d Ds an 
approximation depending on how closely the binomial distributions Diu xis cells n 
adequately represented by normal distributions. For some problems ve ош ee R 
difficulty by grouping small frequencies, as has been done above in Example 12. i x 
such a process sacrifices information and cannot always be carried out, e.g. ina 2 x 2 
Ex the first place the symmetrical binomial (3 + 3))?. The second column 
in the following table shows the probability P that the number of successes in the first 


column will be at most attained (for the smaller of the pair) or attained or exceeded (for the 
larger of the pair). 


Successes D By Р. 

È 0, 10 0-0010 0-0008 0-0022 
1,9 0:0108 0:0057 0-0134 
2,8 0-0547 0-0290 0-0569 
3,7 0-1719 0-1030 0-1714 
4. 6 0-3770 0-2635 0-3759 


If we regard this frequency as that of a single cell (> = 1) we should, for the corre- 
sponding у> ? 
ing to y? = 2(5%, 42, 32, 92, 1°) are shown in the third column as Ру. They may be obtained 
from Appendix Tables 6 and 7, e.g. for the last term we have у? = $ = 04, P = 0-52709, 
P, =} of this = 0-2635. ‘ 

The correspondence between P and P, is evidently not very good. We can, however, 
improve it considerably by a correction due to Yates (1934). The distribution of y? is 
continuous, whereas that of the binomial is not. To bring the two into comparability 
we really should consider the binomial frequency at the value r as spread over the range 


T—1tor--1. For example, for a deviation 3 corresponding to 8 successes we should 


ex ur a 2(2-5)2 
take a deviation 2.5, giving y? = ^. = 


=— = 2:5 instead of 3. The values given by the 
ә 
corrected z? are shown as P, above. The agreement between P, and P is evidently a great 
improvement on that between P, and P. 


When the theoretical proportion in a cell is not 3, the binomial distribution is skew, 
and there do not appear to be any simple corrections to compensate for this effect. The 
continuity correction will, however, result in 


an improvement if the theore 
1$ near $ and is probably best made in 


tical frequency 
all circumstances. 


12.16. The 2 x 2 table may also be dealt with by exact methods. Consider in fact 
the table 
a b a+b 
с а с+а 


304 THE 7?-DISTRIBUTION 


If the two variates are independent, the number of ways in which a table with such 
marginal totals can be constructed from the x sample members is 


E һе 


i: m B ) + ary +d)! (a+ n FAY Ё e — 


'The number of ways in which the body of the array can be completed is 


n! 
a! b! ch d! 
Consequently the probability of the distribution of the table is 
(a + c)! (b + d)! (a + b)! (c + d)! 
n! a! D! c! dl : . : . (12.30) 
Thus the successive probabilities for d — 0, 1, 2, 


(a + c) (a. + mh (0+ 1)(с +1) (641) 
n! al à 1! E 


- are the successive terms in the series 
b + 2)(c + 1)(с +2 

( Me yc + ) ete. ‚ (12.31) 
Example 12.5 (from F. Yates, 1934, quoting data by M. Hellman). 


The following table shows the number of children classified "di 
of the teeth and type of feeding. aM to Den 


| 
| 
Normal Teeth. | Maloceludea Teeth. Torats. 
Breast fed . . . | 4 | 16 20 
Bottle fed . . . | 1 | 21 22 
Torars | 5 | 37 49 
From (12.30) the probability of obtaining no normal breast-fed children if the attributes 
are independent is (4 = 0) 
5! 37! 20! 22! 
2212010151 171 — 003096. 
The probabilities of obtaining 1, 2, . . . children are obtained by multiplying successively , к/ T 
5x20 4x19 3x18 


s 5 , and so on, and are as follows :— 

1x18 2x19 3x 20 
Number of Normal Probability. 
Breast-fed Children. 


0-0310 
0:1720 
0-3440 
0-3096 
0-1253 
0-0182 


с ҥфюҥ+ © 


1-0001 


Thus the probability of getting four or more normal breast-fed children is 0:1435, and 


ature 
t 


> 
p 


NOTES AND REFERENCES 305 


we conclude that there is nothing to reject the hypothesis that pe ie d bu Ре 
effect оп the condition of the teeth. Had we used the 7? test in the A Y уда 
should have found P = 0-0612, less than half the true value. The continuity 

makes a great improvement, giving P = 0-1427. 


NOTES AND REFERENCES 


The 7?-distribution, though known to Helmert in 1876, y De res and applied 
istic i C "son i 1922 Yule an isher gave respec- 

to statistical problems by Karl Pearson in 1900. In g 
tively AR and theoretical evidence for what is now accepted as the correct ee 
of determining the number of degrees of freedom in a bivariate table; but Pearson himse 
Seems never to have acknowledged the soundness of this method, and some papers written 
between 1920 and 1930 on this subject are controversial and therefore not to be accepted 
uncritically. . " 

For the use of the distribution in testing hypotheses when parent parameters. ave 
to be estimated, see Fisher (1924). For the exact test in a 2 x 2 table and the continuity 
correction, see Yates (1934). P. 

More recently, Cochran (1936) and Haldane (1937a, 1937, 19394, 1939b) have dis- 
cussed the distribution of 7? in bivariate tables when some hypothetical frequencies are 
small. See also Haldane, Biometrika (1945), 33, 231 and 234. 


Cochran, W. б. (1936), “ The z°-distribution for the binomial and Poisson series, with 
small expectations,” Ann. Eugen., Lond., 7, 207. 

(1938). “ Note on J. B. S. Haldane’s paper [1937a below],” Biometrika, 29, 407. 

David, F. N. (1947), “A у? ‘smooth’ test for goodness of fit," Biometrika, ЗА, 299. 

Fisher, R. A. (1922), “ On the interpretation of у? from contingency tables and the calcu- 
lation of P," Jour. Roy. Statist. Soc., 85, 87. 
—— (1924), “ The conditions under which 4? measures the discrepancy between observation 
and hypothesis," Jour. Roy. Statist. Soc., 87, 442. 
Garwood, F. (1936), “ Fiducial limits for the Poisson distribution,” 
Haldane, J. B. S. (1937a), “ The exact value of the moments o 
as a test of goodness of fit when expectations are sm 

—— (1937), “ The first six moments of X? for an n-fold table with n degrees of freedom 
when some expectations are small,” Biometrika, 29, 389. 

—— (1937c), “The approximate normalisation of a class of frequency distributions,” 
Biometrika, 29, 392. 

— (19394), 
220. 


— (19390), “ The mean and variance of X* when used a 
samples are small," Biometrika, 31, 346. 

— (1939c), “ The cumulants and moments of the binomial distribution and the cumulants 
of у? for an n x 2 fold table," Biometrika, 31, 392. 

Wilson, E. B., and Hilferty, M. M. (1931), 
Sci., 17, 694. 

Yates, F, (1934), “ Contingency tables involving sm: 
Jour. Roy. Statist. Sota 1, 217. 

Yule, б. U. (1922), $ An application of the y? method to association and contingency tables 

Е үз раа illustrations,” Jour. Roy. Statist. Soc., 85, 95. 


Biometrika, 28, 437. 
f the distribution of 7? used 
all," Biometrika, 29, 133. 


“ Corrections to formulae in papers on the moments of x5," Biometrika, 31, 


в à test of homogeneity when 


“The distribution of chi-square," Nat. Acad. 


all numbers and the X? test," Supp. 


x 


306 THE DISTRIBUTION 


EXERCISES 
12.1. By the method of 12.8 show that 


135? — үх | 5 T. ) 64/2v 1 
[( 12» ) т! s i 48v 1 5 | 18v 


is approximately normally distributed with unit variance about zero mean. 
(Haldane, 1937c.) 


12.2. Use the y?distribution to show that the distribution of digits from telephone 
directories (Table 1.4) could not in all probability have arisen by random sampling from 
a population in which each of the ten digits occurred with the same frequency. 


12.3. Show that for a 2 x n bivariate table » = n — 1 and 
Е — ‚ш. 
7% = Ny Ne 
Qij + а 
where ауу, аз; are the frequencies in the jth column and Ni, n, are the border sums of the 
two rows. 
7 12.4. Show that if v is even 
е Men 
P =| —e 3-4, [2 I(v) 
Li 


== J Jen eae ae Vir NR 

, 2 246... (y — 2) 

and hence'that the values of P for given y? can be derived from tables of the Poisson 
exponential limit. j 


12.5. Show that in a 2 x 2 table whose frequencies are 


(а +b +c +d)(ad — bc)? 
@ + be + 4)(6 + аа +o) 


the theoretical frequencies being those obtained on the hypothesis that the two variates 
are independent., 


pa 


12.6. An experiment gives on hypothesis H д = 9,7 = 8, 
the same result. Show that the two taken together do not give 
H as either taken separately, 


When repeated it gives 
the same confidence in 


- -— 


Сте aE 


EXERCISES : 307 


ї iments in Northern Ireland, 1931- 
12.7. (Data from Report on the Spahlinger Experimen i rther » 
1934, H.M. тегзан Office, 1935.) In experiments on the immunisation of cattle from 
tuberculosis the following results were secured :— 


Died of Tuberculosis Unaffected or only 
or vey аа slightly affected. "Torars. 
nffected. 
| 
| 
5 19 
Inoculated with vaccine . . . 6 13 
Not inoculated or inoculated with a 
s 8 3 
control media d PES. 
| TOTALS l4 16 30 


Show that for this table, on the hypothesis that inocul 
tuberculosis are independent, y? = 4:75, P = 0-029; 
the corresponding probability is 0-072 


ation and susceptibility to 
with a correction for continuity 
; and that by the exact method of 12.15 P = 0-070. 


12.8. (Data from Yule, Jour. Anthrop. Inst., 1906, 36, 3265.) 

Sixteen pieces of photographic paper were printed down to different depths of colour 
from nearly white to a very deep blackish-brown. Small Scraps were cut from each sheet 
and pasted on cards, two scraps on each card one above the other, combining scraps from 


the several sheets in all possible ways, so that there were 256 cards in the pack. Twenty 
observers then went t 


hrough the pack independently, each one naming each tint either 
"light," * medium ” or “ dark." 
The following table shows the name assigned to each of the two pieces of paper :— 
I Name assigned to Upper Tint. 
Name assigned to 
Lower Tint. | Torars. 
Light. | Medium. Dark. 
Light . * с 850 571 580 2001 
| | 
Medium . " 618 593 455 1666 
| 
Dark . E я 546 456 457 1453 
| | 
Torars 2008 | 1620 1492 5120 
Show that there is a significant association betw i 
а; j etween the rame assigned to i 
: ass one piece 
and the name assigned to the other. 3 i 


12.9. Derive the X?-distribution from that of the sum of s 
ө n 
m standard measure, Subject о у 


ЧЛ. 


quares of љ normal variates 


at; = 0, by the substitution 2) = шуух subject to 


М 


$ e 
и} = 1. 


i 
A 


CHAPTER 13 
ASSOCIATION AND CONTINGENCY 


13.1. This and the next three chapters deal with the relationship between two or 
more variables. We shall consider populations, the members of which each bear one of 
each of several different sets of qualities or one value of each of several different variables 
and shall discuss the measurement of the relationship among the qualities or SS Ded 
inthe populations. The corresponding questions of sampling will also fall for consideration. 
We may denote this branch of the theory by the general name of Theory of Dependence. 


Association 

13.2. Consider in the first instance a population classified according to whether 
each member bears or does not bear an attribute A. The presence of the attribute we 
may denote by A and the absence by « We shall assume that each member must either 
be an A or an a, so that 2 = not-A and A = not-«. 


Suppose that each member of the population is classified according to two attribute: " 
s 


A and B. Each may then be one of four kinds, AB, Af, «В and af. For example, if th 

attributes are the possession of blue eyes (A) and the possession of male sex (B) we I М 
have the four possible classes АВ = blue-eyed males, 46 = blue-eyed females. «B 25 hall 
blue-eyed males, «f = not-blue-eyed females. Denoting the number in any clas b a, 
letters appropriate to that class in ordinary round brackets, we may then I the 
population in the tabular form :— y the 


| 
Bs |  not-B's TOTALS. 
A’s 
. (13.1) 
not-A’s 
— 
TOTALS 
or, more simply, by 
a 
о . 
(13.2) 
— 
| a+c 


where a = (AB), ete. Here N is the total number in the population, 
308 


—— «атас 


——— 


ASSOCIATION 309 


13.3. If there is no relationship between the attribute A and the attribute B, that 
is to say if the possession of A is irrelevant to the possession of B, then there must be 
the same proportion of 4's among the b’s as among the f's. We thus define two attributes 
to be independent if 


CIO PC Qm 
(В) (8) 
от аена t . (13.4) 
a+c b+d N 
‚ It follows at once that each of the following is true :— 
с эф © гы . (18.5) 
a+c b+d N 
РЕНО шлш шт C Jk meus 
a+b с+а N 
За ш Sg a ъё) 
a+b c+td N le: 


These are derivable by simple algebra from (13.4) and, in words, will be found to express 
the same fundamental fact that the proportion of members bearing an attribute X is the 
Same among the Y's as among the not-Y's. It also follows that 

а= ERA ш 
N 
with three similar equations in b, c and d. 
If now in any given table 


TA heiii Pw MW ons 


or, in the alternative notation, 


gam 


. А 5 . (13.10) 
we shall say that A and B are positively associated. Per contra, if 
(AB) < LAM se Of ъа 
N 


they are said to be negatively associated or disassociated. 
Example 13.1 


Association between inoculation against cholera and exemption from attack. 


(Green- 
wood and Yule (1915), Proc. Roy. Soc. Medicine, 8, 113.) 


The following table shows 818 


| | 
Not Attacked. | Attacked. TOTALS. 
Inoculated . . . , 276 | 3 279 
| 
Not-inoculated . . . 473 66 539 
TOTALS 749 | 69 818 
н } | 


310 ASSOCIATION AND CONTINGENCY 


cases classified according to inoculation against cholera (attribute A) and freedom from 
attack (attribute B). М 
If the attributes were independent the frequency in the inoculated-not-attacked class 


279 x 749 pa : 
would be == = 255. The observed frequency is greater than this and hence in- 


oculation is positively associated with exemption from attack. 


13.4. The reader will recognise in this example a type of 2 x 2 table which was 
discussed in connection with the y?-distribution. In fact, if the data are considered as a 
sample there arises at once the question how far the positive association, which certainly 
exists in the sample, is indicative of real association in the parent population. The 7? 
distribution, as shown in the previous chapter, provides an objective method of forming 
а judgmeat on this matter. у? itself, however, does not provide an adequate measure of 
the intensity of the association. Altogether apart from sampling questions, we sometimes 
wish to compare the strength of associations in different populations or between different 
attributes, and some coefficients proposed for the purpose will now be considered. 


13.5. The more obvious desiderata in a coefficient measuring association are (a) 
that it shall vanish when the attributes are independent; (b) that it shall be + 1 when 
there is complete positive association and — 1 when there is complete negative association ; 
(c) that it should increase as the frequencies proceed from dissociation to association As 
to this latter point, consider the difference between observed and “ independence » 
frequencies in the cell corresponding to (AB), viz. :— 

— (аву — A) 

ПЕВ yug). : : E - (13.12) 

Since the border frequencies are constant it is evident that the difference in an 

observed and “independence ” frequencies is +ô and thus ô determines 

departure from independence. We may interpret condition (c) 
efficient should increase with 6. It may be noted that 


ó а (Gba +0) 
a+b+c4+d 
ad — be 
GO um тт) 
Following Yule (1900, 1912) we define the coefficient of association Q by the equation 
ad —bc № 
felt, imer У р . + (13.14) 


y cell between 


les uniquely the 
as meaning that our co. 


Tt is zero if the attributes are independent, for then д = 0. It can equal + 1 o 
in which case there is complete association (either no «’s are B's or no A's are 
only if ad = 0, in which case there is complete disassociation. Furthermore 


with ó, for if e=% 


then Q 
dQ dQ 


and —* is negative, as is 2 so that dé is positive. 


de 


nly if be = 0, 
B's), and — 1 
» Q increases 


ELE 
woes 


m n 


ASSOCIATION 311 


A somewhat similar coefficient, also due to Yule, is the coefficient of colligation 


1 bc 

E Май 
= be 
1+ ad 


VOI pas ORE (15215) 


This also satisfies our conditions, as the reader may verify for himself. 


13.6. Yet a third coefficient, which will be shown below to be related to ?, is 
Nó 
+ М(4)(@@)(В)(8) 
(ad — bc) 
= (а + ba + 90 4- dc F d) 
This is evidently zero when 6 = 0 and increases with 6. If V — 1 we have 


(a + by(a + о) + dY(c + d) = (ad — bc)*, 


«DA 


26 (3:06) 


giving 
dabed + a*(be + bd + cd) + b*(ac + ad + cd) 
+ c*(ab + ad + bd) + d*(ac + ab + be) = 0. 


Since no frequency can be negative this can only vanish if at least two of a, б, c, d are zero. 
If the frequencies in the same row and column vanish the case is purely nugatory. We 
have then only to consider a = 0, d = 0 orb = 0, с = 0. In the first case V = 1, in the 
second V = — 1. It cannot lie outside these limits. 


13.7. It will be observed that whereas V is unity only if two frequencies in the 2 x 2 
table vanish, Q and Y are unity if only one frequency vanishes. This raises a point in 
connection with the definition of complete association. We shall say that association is 
complete if all A's are B’s, notwithstanding that all B's are not A's. If all dumb men 
are deaf there is complete association between dumbness and deafness, however many 
deaf men there are who are not dumb. The coefficient V is unity only if all A's 
are B’s and all B's are A's, a condition which we could, if so desired, describe as absolute 
association. 

It is necessary to point out in this connection that statistical association is different 
from association in the colloquial sense. In current speech we say that A and B are 
associated if they occur together fairly often ; but in statistics they are associated only 
If A occurs relatively more or less frequently among the B's than among the not-B’s. If 
90 per cent. of smokers have poor digestions we eannot say that smoking and poor digestion 


are associated until it is shown that less than 90 per cent. of non-smokers have poor 
digestions. 


Example 13.2 


Consider again the data of Example 13.1. For the various coefficients we have 


_ (276 x 66) — (3 x 473) 
(276 x 66) + (3 x 473) 


= 0-8555 


d 


312 ASSOCIATION AND CONTINGENCY 


1 dorum x 413 
276 x 66 
tet BER = 0-5636 
276 x 66 


_ (276 x 66) — (3 x 473) 
(279 x 539 x 749 x 69) 


These values are, as might be expected, different, although they all refer to the same intensity 
of association. Comparisons, however, naturally fall to be made between values of co- 
efficients of the same type, and the fact that different types give different values does not 
affect their usefulness or the comparability of members of any one type. 


— 0-1905. 


13.8. The methods of Chapter 9 may be used to give the standard errors of the three 
coefficients based on material obtained by random sampling. 


We recall that for any of the four frequencies a, b, c, d we have results such as 


a(N —a 
vara = асте К ше ЕЕ Sei) 
b 
cov (a, b) = -F TENES (15.18) 


. The first is merely another way of writing the expression for the variance of a binomial. 


The second follows from 
var (a + b) = vara + var b + 2 cov (a, Б). 
bc m с + 
Then for е = m We have, writing 4 for the differential to avoid confusion with the 
frequency d, 
Ae | Ab , Дс 4a Да 


E b T c ER de 
whence таша Е + 2 {+ еее 
ee а? ab 
Substitution from (13.17) and (13.18) gives 
DOT SE 
yane = 2C AF b ef zm 1). . . . e (13.19) 
We 
Now Q= 
l+e 
AQ |. 2Ae 
and hence jo. eae 
4 2 
giving varQ — асы 


elis DE CER + 1+1), "o. (13.20) 


In a similar way we have 


(ul PESE Jed ani 
vec Ы, AE = Бл 


б 


PARTIAL ASSOCIATION 313 


The sampling variance of V may be found similarly but involves rather more lengthy 
algebra. Yule (1912) gives the result 

(a — d)? — (b — 0)? 
{(@ + b)(a + c)(b + d)(c + d) 


var V = x Es РЕР ay 


3yaf(@ +b — c — å)? EE | х 13.22) 
У 1 (a + b)(c +d) (a + c)(b + d) ] 


In applying these formulae it is, as usual in large-sample theory, assumed that the 
observed frequencies may be used instead of theoretical frequencies in the sampling 
variances. 


Example 13.3 


Reverting to Example 13.2, we have for the standard error of Q 
o 


JD" QA Т 
m Jit iti) 


popu Lau... 
~ 2 N \276 з 473 ^ 66 


— 0-0798. 


The coefficient Q thus probably lies in the range 0-856 4- 0-239 in the population from 
Which these data were derived, assuming of course that the sampling was random. The 
upper limit here, of course, must be unity. 


Partial Association 


13.9. The coefficients described above measure the dependence of two attributes in 
the statistical sense, but in order to decide whether such dependence has any causal signifi- 
cance it is often necessary to consider associations in sub-populations. Suppose, for 
example, a positive association is noticed between inoculation and exemption from attack. 
lt is natural to infer that the inoculation confers exemption, but this is not necessarily so. 
It might be that the people who are inoculated are drawn largely from the richer classes, 
Who live in better hygienie conditions and are therefore better equipped to resist attack 
or less exposed to risk. In other words, the association of A and B might be due to the 
association of both with a third attribute C (wealth). 

Now itis clear that this explanation would not hold if the hygienic circumstances were 
constant in the population. If we then consider the association of A and B in the sub- 
populations (C) (well-to-do classes) and (y) (poorer classes) and find that it persists, the 


explanation is rejected. Furthermore, if the association in (y) was weaker than that 
in (C), there would be some indication that hygienie conditions are rela: 
from attack, though not constituting 


ted to exemption 
the only factor concerned. 


13.10. Associations in sub 


-populations are called partial associations. 
to (13.9), A and B are said to 


be positively associated in the population of 

AC)(BC) 

(ABC) > (ABO) 
(С) 


Analogously 
C's if 


* 3 o» & "WURDE 


314 ASSOCIATION AND CONTINGENCY 


where (ABC) represents the number of members bearing the attributes 4, B and C ; and 
so on. We may also define coefficients of partial association, colligation, etc. such as 


(ABO)(2BC) — (ABC(«BO) 
бав:с = (ABOJGBC)--(ABOuBO © >: 0925 


which is derived from (13.14) by adding C to all the symbols representing the frequencies. 


Example 13.4 


Galton’s “ Natural Inheritance " gives some particulars, for 78 families containing not 
less than six brothers or sisters, of eye-colour in parent and child. Denoting a Tights ed 
child by A, a light-eyed parent by B and a light-eyed grandparent by C ae ipd ae 
possible line of descent and record whether a light-eyed child has light-eyed arent aa 
grandparent, the number of such being denoted by (ABC) and so on. The Ed (АВ ) 
for example, denotes the number of light-eyed children whose parents and grand z Е 
һауе not light eyes. The eight possible classes are al 


(ABC) = 1928 («BC) = 303 
(ABy) = 596 (xBy) = 225 
(ABC) = 552 («BC) = 395 
(ABy) = 508 (оду) = 501 


The first question we discuss is: does there exist any association between parent and 

offspring with regard to eye-colour ? We consider both the grandparent-parent grou 

(association of B’s and C’s) and the parent-child group (association of A’s and Bu. WS 

have, for the former: S e 
Proportion of light-eyed among children of light-eyed parents, 


(BC) 2231 

OP! = c 70-2 \ 

(С) 3178 per cent 

Proportion of light-eyed among children of not-light-eyed parents, 

(Ву) = EPI E 44-9 per cent.; 
(y) 1830 

and for the latter, analogously, 
(AB) 2524 
“(By - 3052 = 82:7 per cent. 
(АВ) _ 1060 _ 5 
ME) а de à 
(B) 1956 per cent, 


Frequencies such as (Af) are calculable direct from the eight classes given aboye e 
(AB) = (ABC) + (ABy) = 552 + 508 = 1060. Ж? 
Evidently there is some positive association between parent and offspring T regard t 
ур оош i i -colours of grandparent: 3 
ws ME: now the relationship between eye-co parents and gran dehlin. 
Proportion of light-eyed among grandchildren of light-eyed grandparents 


= (40) = 2280 = 78:0 per cent. 


= =“ 


PARTIAL ASSOCIATION 


318 


Proportion of light-eyed among grandchildren of not-light-eyed grandparents 
© 
LM = ——— = 60-3 per cent. 
The association between grandparents and grandchildren is also positive. 
In tabular form the data are :— 


Parents. 
Grand, arents. 


-3 
| О 7 Torars. 
à 2231 821 3052 E A 
& 947 1009 1956 S| а 
TOTALS 3178 1830 | 5008 TOTALS 


Grandparents. 


| c TOTALS 
а 
E A 2480 3584 
Е x 698 1424 
£ | 
é tu. Pe UR A 
ToraArs 3178 5008 


Q X 
Grandparents—parents . . . . * 0487 0.260 
Parents—children . . ` . + 0:603 0:336 
Grandparents—grandehildron . < 0-401 0-209 


Now the question arises : is the resemblance between 
Merely to that between grandparent and parent, parent and child ? To investigate this, 


Wwe must consider the associations of grandparent and grandchild in the sub-populations 


7 parents light-eyed ” and “ parents not-light-eyed,” that is, the associations of А and С 
їп B and р. We have :— 


grandparent and grandchild due 


Parents Light-eyed 


Proportion of light-eyed amongst grandchildren of light-eyed grandparents 


(ABC) 1928 


(BO) "3931 ^ 86-4 per cent, 


316 ASSOCIATION AND CONTINGENCY 


Proportion of light-eyed amongst grandchildren of not-light-eyed grandparents 


Parents not Light-eyed 
Proportion of light-eyed amongst the grandchildren of light-eyed grandparents 


Proportion of light-eyed amongst the grandchildren of not-light-eyed grandparents 
— (Абу) _ 508 


(By) 1009 


= 50:3 per cent. 


In both cases the partial association is well marked and positive. The association 
between grandparents and grandchildren cannot, then, be due wholly to the associations 
between grandparents and parents, parents and children. There is ancestral heredity, as 
it is called, as well as parental heredity. The relevant four-fold tables are :— 


Parents light-eyed Parents not-light-eyed. 
Grandparents. Grandparents. 
—— 
BC Bo Ву 
E g [КЕЕ 
Ф “| 
H AB 9) 4B 552 508 
т $ 
Е Fe 395 5 
E «B Е 5 501 
a o а | 
Torars TOTALS 947 1009 
г | 


The coefficients of association and colligation are :— 


aog = 0:412 @лсв = 0159 
Y дов = 0:216 У дов = 0:080 


13.11. If there are p different attributes under consideration the number of partial 


associations can become very large, even for moderate p. For example, we can E 


two in (2) ways and consider their associations in all the possible sub-populations of the 


m Disn-2 
other (p — 2), which are seen to be 3^7* in number. Thus there are 95 * associations, 


In practice, however, we need only consider a few of them. 


aa» 


| 
| 


f so that even if A and B are independent in (C) and (y) they will 


PARTIAL ASSOCIATION 317 


One result in this connection is worth noticing. We have, generalising б of equation 


(13.13) :— 
(AC)(BC) _ (Ay)(By) 
кеин [ano 1 } + {ив» E } 


(AB) у (A)(C)(BC) 
N (С)(у) N 
— (BYCYAC) | AR 


(Ble [aoa = 


2 N N? 


N 
Rd. EA a 2. oe (13.25) 
48 (буу) i 


If, then, A and B are independent in both (C) and (у), бявс = блв. = 0 and 


MET AF MCN сва Са 
АВ (С)(у) AC “BC, 


ie. they are not independent in the population as a whole unless C is independent of A or B 
or both in that population. : 
This peculiar result indicates that illusory associations may arise when two populations 
(C) and (y) are amalgamated, or that real associations may be obscured. If A and С, 
B and О, are associatel we have, from (13.25), 
N 
дав = OG бав дво + давс + бав» 


appear as associated in 
are associated positively in (C) and negatively in (y), 
they may appear as independent in the whole population, 


the two together. Again, if A and B 
блв may be zero, that is to say, 


Example 13.5 


Consider the case in which a number of patients are treated for a disease and there 
is noted the number of recoveries. Denoting A by recovery, « by not-recovery, B by 
treatment, В by not-treatment, suppose the frequencies are 


B | p | ТотАтз. 
A | 100 | 200 | 300 
x 50 | 100 | 150 
| 
Torars 150 | 300 450 


A)(B ; 
Here (4B) = 100 = í 16 ) 80 that the attributes are independent, 


So far as can be 
Seen, treatment exerts 


no effect on recovery. 


318 ASSOCIATION AND CONTINGENCY 


Denoting male sex by C and female sex by y, suppose the frequencies among males 
and females are 


Males 


Females 
| J| [ | -m- 
BG Bc | Torats. By By TOTALS. 
= 1 | 
AG 80 100 180 Ay 20 | 100 120 
«C 40 80 120 | ay 10 20 30 
| | 
| TOTALS | 120 180. | 300 TorALS 30 120 150 


In the male group we now have 


_ (80 x 80) — (100 х 40) _ 0.231 
(80 x 80) + (100 x 40) 


Qan.c 


and in the female group 
Qag, = — 0:429. 


Thus among the males treatment,is positively associated with recovery, and among the 
females negatively associated. Тһе apparent independence in the two together is due to 
the cancelling of these associations in the sub-populations.. 


Contingency 
13.12. We now turn to the more general case in which a population is divided into 


a number of categories А, Аз... A, instead of simply dichotomised into two, A and 
not-A. If there is a second classification into By, Ba . . . B, the frequencies may be 


arranged in the form :— 


A, | А, | Ap TOTALS. | 
B, (A,B1) | (4,В)) "у (ApB,) (B,) 
B, (A,B,) | (i). Eee (4,83) (Ba) 
А GEN Sens reco 
DL E | H 
Bq (A, By) | (Вун EE (ApBa) (By) 
TOTALS (A,) | (A5) ыла (A5) N 
1 s zx 


CONTINGENCY 319 


This is known as a contingency table. We have already пса ап WS E 
| 1 1 1 f course, be regarded as contin 
Table 12.4. Ordinary bivariate frequency tables can, o е 2 
Eales but there is E difference : in the bivariate table the order of rows and oe 5 
determined by the variate-values, whereas in the contingency table the order of rows an 
columns is, in general, arbitrary. [ e 
Tn (13.27) di frequency in the ith column and jth row is denoted by (А,В). As 
in the case of the 2 x 2 table we write , 
(2208) . (13.28 
[n = (4,Bj) = “ү . . " s ( ) 


and define the attributes as independent if every 6 is zero. We have: 


J(B, а 
Sy ABN- Саа 


ab- ` 
i 
N? 
а ена . 4 S EF MEER 
N N 
and, in conformity with the notation of (12.6) we have 
Nó 
i у >С). v Ww mM e REN SISO) 
* = (ANB) 


2 . 
; А : 75 : 
X* is sometimes called the square contingency and л the mean square contingency. 


13.13. We have already seen in Chapter 12 how у? may be used to test the hypo- 
thesis that the observed frequencies could have arisen by random sampling from a popü- 
lation in which the attributes were independent. We now proceed to consider the con- 
struction of measures of dependence. 

Evidently y? = 0 if and only if each 6 = 0. Thus у? vanishes if and only if the 
attributes are independent in the observed population. Furthermore, as Z? becomes 
larger the observed frequencies deviate more and more from the “ independence " fre- 
quencies; it thus provides some sort of measure of the strength of relationship in con- 
tingency tables. For example, in the 2 x 2 table we have 


Pri 
V? (equation (18.16) = &, v wv XR TRAD 


Which illustrates the relationship between V and у?. 


X? itself, however, does not constitute a very useful coefficient since it may increase 
without limit. Following Karl Pearson we may put 


z x? 
= [55 # зд в emo 


and call C Pearson's coefficie 
as it should, when the 
Consider, for example, 


nt of contingency. Even this has its limitations. 
re is complete independence ; 


at x t table in which the dia: 


3 It vanishes, 
but in general it cannot attain unity. 


gonals (A;B;) are of frequency о; and 


320 ASSOCIATION AND CONTINGENCY 


all other compartments are zero. Obviously no greater degree of dependence is possible. 
We then have 


ы "e 
би тел Ww 
a; 
m z(u » x) 
M : 
a 
= (1—1 
2 ДТ | 
апа Р Gi m а E B " ‚ (13.33) | 
If ¢ = 5, for example.| ~^ maximum value of C is 0.894. | 


To remedy this .u.-t Tschuprow proposed the coefficient 


qu "B H | T. 
= = 1)( — 5} E : . (13.34) 


| 

| 
This can attain unity when p — q; but it is still not clear how it behaves when p and } А | 
q are unequal. ^ 


Example 13.6 
TABLE 13.1 


Distribution of Schoolchildren according to Intelligence and Standard of Clothing. 
(From W. H. Gilby (1911), Biometrika, 8, 94.) 


ЕАО De у и: DN и 
| | 
Very wel clad. . . | 33 48 Li. | 209 | 194 39 636 
Welliclad’ e. ee 41 100 202 | 255 | 138 15 151 
| з 
Poor but passable. . 39 58 | 70 | 61 | 33 4 265 - 
| | | n = 
| 
Very badly clad . . 17 185 | 22 | 10 | 10 e 1! NS ү 
| i — 
Torars | 130 219 | 407 535 | 375 59 1725 
t | 


The above table shows the distribution of 1725 schoolchildren who were classified 
(1) according to their standard of clothing and (2) according to their intelligence, tho 
standards in the latter case being A = mentally deficient, B = slow and dull, C — dull, 
D = slow but intelligent, E = fairly intelligent, F = distinctly capable, б = Very able. 
Required to discuss whether there is any association between standards of clothing and 
intelligence. ays 

We note in the first place that a table of this kind could, theoretically, be discussed 


— 


y м 


CONTINGENCY 321 


by considering all the possible 2 x 2 comparisons to be extracted from it; e.g. for the 
corners of the table we have 


| "T | A and B | G Torars. 
| | { 
| 72 
Very well clad . . . | 33 | 39 7 
| 7 1 18 
Very badly clad. . . 17 | 
Torats | 50 | 40 90 
| 


Here, for example, 54 per cent. of the very well clad were very able, but only 5 per cent. 
of the very badly clad. However, what we really require is not a series of individual 
comparisons of this kind but a general comparison over the whole table, and it is for such 
purposes that the coefficient of contingency is designed. 


We then proceed to work out the “ independence ” frequencies, e.g. that in the top 
Б 1: TA 3 ; 
left-hand corner of the table is E = 47.930. The contribution to у? from this 
25 
es = 4:651. It will be found that the sum of the contributions 
i 
from the 24 compartments is 174-92. We then have 
174-92 
= TORT riaa = 0:303, 
< Je + 174-93 

indicating a considerable degree of association. 


compartment is then 


For the Tschuprow coefficient we have 
“92 
T= Jam 92 _ 0-169, 


7254/15 
There is evidently some general relationship between 
a very strong one. The reader may verify for himself by 


of C and 7 are significant, i.e. could not have arisen by 
in all probability. 


the two attributes, though not 
using the у? test that the values 
sampling from independent attributes 


13.14. The sampling variance of the coefficient of c 
at in virtue of sheer algebraical complexity ; and it is not cl 
error is legitimate in this connection. 
(19154) and Kondo (1929). 
see K. Pearson (19150). 


ontingency is difficult to arrive 
ear how far the use of a standard 
ade for the formulae to K. Pearson 
ed question of partial contingency 


Reference may be m 
For the even more complicat 


13.15. In concluding this ch: 
and contingency discussed therein in no way rely 
of attributes on a variate-scale, or even on the possibility of arranging them in order. 
Rearrangement of rows and columns in the two-way tables does not affect the values of 
the coefficients.* In the next chapter we shall consider the relationship between variates 


apter we point out that all the measures of association 


on the possibility of the measurement 


un nay change the sign of a coefficient of association. This is equivalent to a slight 
ue of standpoint in what is regarded as a positive association—for example, positive association 
i Pres fair hair and blue eyes is equivalent to negative association betwei 
; -—VOL. I. 


en fair hair and not-blue eyes. 
Y 


322 ASSOCIATION AND CONTINGENCY 


and -certain coefficients based on the assumption that the attribute classifications are 
made according to the divisions of a variate-scale. These coefficients (tetrachoric 7, biserial 
n, etc.) have been used as measures of association, but they are essentially different in 
character from those discussed in this chapter. The reader who refers to memoirs written 
on this subject between 1900 and 1920 will find it useful to remember this fact. 


NOTES AND REFERENCES 


The fundamental memoir on association of attributes is that of Yule (1900), who 
introduced the coefficient Q in it. In a later paper (1912) Yule reviewed the whole Бае 
and proposed the coefficient denoted in this chapter by У. This memoir contained Lon 
criticisms of Karl Pearson's coefficient now known as tetrachoric r (cf. Chapter 14) and 
evoked a reply from Pearson and Heron (1913) which is remarkable for having missed 
the point over more pages (173) than perhaps any other memoir in statistical histor 

Pearson's coefficient of contingency C was introduced in 1904. Corrections to his 
coefficient were subsequently proposed, being based on the notion of an underlying variate 
(К. Pearson, 1913). For references to the other coefficients proposed on this basis, see 
Chapter 14. | 


Kondo, ти * On the standard error of the mean square contingency,” Biometrika, 
, А 
Pearson, К. (1904), “ Оп the theory of contingency and its relation to association and 
normal correlation,” Drapers’ Company Research Memoirs, Biometric Series 1 
Dulau and Co., London. > 
(1913), “ Оп the measurement of the influence of broad categories on correlati 
г : el » 
Biometrika, 9, 116. ation, 
— — and Heron, D. (1913), “ On theories of association,” Biometrika, 9, 159. 
` —— (1915а), “ On the probable error of a coefficient of mean square contingency," Bi 
metrika, 10, 590. › io- 
—— (1915b), * On the general theory of multiple contingency, with special refe 
partial contingency,” Biometrika, 11, 145. sanoe o 
Yule, G. U. (1900), “ On the association of attributes in statistics,” Phil. Trans., A, 194, 257 
(1912), * On the methods of measuring the association between two attributes » 
Jour. Roy. Statist. Soc., 75, 579. 2, 


EXERCISES 


13.1. Show that the coefficient of association is greater in absolute value than th я 
e 


coefficient of colligation, except when both are zero or unity in absolute value 


13.2. Show that for a contingency table with a constant number of rows and 
the Pearson coefficient of contingency C is equal to the Tschuprow coefficient DP a 
x? A n : x or two 
values of "y, One of which is zero; that for лу between these values C > T, and. for ge 
greater than the higher value T — C. N 


EXERCISES 


323 


13.3. The following table shows 68 lobelia plants classified according to whether they 
were cross- or self-fertilised and above or below average height. 


AN 


l Torats | 


| 
Above Average. | Below Average. TOTALS. 
Cross-fertilised H | 17 | Mi 34 
- | 
Self-fertilised . ES | 12 | 22 34 
| 29 | 39 68 
| 


Show that У = 0-150 and that this is not significant of association if these data aro 
a random sample from lobelia plants generally. 


13.4. In the hair- and eye-colour of Table 12.4 show that C = 0-37 and Т = 0-25. 
1 13.5. In a paper discussing whether laterality of hand is associated with laterality 
of eye (measured by astigmatism, acuity of vision, etc.) T. L. Woo obtained the following 
results (Biometrika, 204, pp. 79-148) :— 1 
Ocular Laterality for General Astigmatism. 
PES “ Left-oyed." | Ambiocular, " Right-eyed." | Torats. 
5 о | 
Be 
: 2 &| Left-handed 34 62 28 124 
bà 
"© Д | Ambidextrous 27 28 20 75 
© 
| a: | 
Hn 
L| Right-handed. 57 105 52 214 
ae 
„ЖЕ mE Torars 118 195 100 413 
=H - 
Á o Show that laterality of eye is only slightly associated with laterality of hand, and that 


13.6. 


association is not signifieant. 


Tn the notation of 13.5 show that Q= 


2Y 
ї-+ yY 


CHAPTER 14 


PRODUCT-MOMENT CORRELATION 


* 


14.1. At the end of Chapter 1 there were given a few examples of bivariate frequency 
tables. We now proceed to consider such tables in greater detail and to discuss methods 
of measuring the dependence of the two variates represented. in them. It is, of course, 
possible to treat the problem by the methods of the previous chapter and regard the tables 
as contingency tables ; but when data are classified according to a numerical variable more 
exact methods are available in an important class of cases. 

The types of bivariate distribution arising in practice are not so easy to classify as the 
univariate types. Table 1.15 on page 20, showing the distribution of beans according to 
length and breadth, and Table 1.25 on page 27 showing the number of cows according to 
age and milk yield, evidently correspond more or less to the unimodal univariate distribution, 
for not only the border frequencies but the frequencies in individual rows and columns are 
of the unimodal type. Biometric distributions are often of this character. On the other 
hand, Table 1.26 on page 28, showing discount rates and bank reserves, has the border 
column of the unimodal type and the border row of the J-shaped type. In Tables 14.1 to 
14.3 are given three more examples of the kind of material encountered in practice, 
Table 14.1 shows the distribution of a number of persons according to age and highest 
audible pitch ; Table 14.2 the distribution of registration districts according to proportion 
of male births and total number of births ; and Table 14.3 shows the distribution of sons 
according to stature of son and stature of father. 


324 


се| c g z $ | ot | zı | 9t | $6 | zz | te | OP | oF | go | 84 | 18 | атт | ser | Ha | тое | тта |*S08 | арр | coz | #8 | ST jstvzor 
= | (2) (t-) (те) | (82) -gg 
y ИСЕ ie а БЕ И hom ees imm | o == Т [ш ү т [= 
= * | Ө. P ESL 81-) Jr deae 
z к=; — — — == == а 
(0) (01) 
- -63 
z E EE (к= рер n. [ез n ж, с а | T D =| =| = 
(д1) -pa 
“Oo 
9r = = P үр Eni |р к= = 
-93 
1} —|—|-— — = 
83 
Or Е К КС К КБ К om od Кл к Е. 
(9) | -z 
wo |—|—|—|—|—|—!—|—|—|l— 6 1 
(0) | (0) (0) (07 |) = 
wor, = ү — | — | | —)|— | 1 1 ESSET Е 6T 
(12 | (oI) (2) | ут 
Sg. н | | | ы ъа | — | s Е E 
(0£—) | (82) | (08—) | rz) | (&в—) | (0с) -e 
og aa (| аруу = 1 1 Е g 9 — 
(1g-) (9$) | i$) | (66) | (0€) | (6) | (ce~) = 
£6 -|- 1 = | LZ z 9 id = 
(01-)| (62) (¥9-)} (09-) | (0€) | (c9) | (81) | (РР) | (OF-) afi 
vot 1 Di = [4 e 8 1:9 | i | a | *t — 
(c6-)| (¢2-) (09-)| (¢¢—-) - 
ol I =з | [yemas ST 1 = 25 6 
(801) | (201-) | (96-) | 06) | (Ф8- ) (82) | (@4-) | (99-) | (09-) (0) | (9) | (ст) | (81) E 
op E % I © £ т | % g € e & Е 1 T. || cs 
(08) (0) 
£ —|—-|—^|-|-2|-7|-|t'|-—l- |—|—|—|—|—|* 
S1V101/-0-1L | вера | еа |-€-89 |-9-69 |-9-z9 |-¢-68 (-в-ө |-¢-¢¢ |-е-00 |-e-Lv |-¢-46 | сети -¢-8¢|~9-¢e оссе) -¢-62) -с-ор| -e-fc| -0-03 -e1 -ertl -¢-11] -¢-8 | -0-0 
— | 
*sivak ‘әу 


‘reg "d ‘TI opdurexsp ш poure[dxo ore syoyovsq ur sioqumu oy, 
L (оте ‘GT 'syngowoig ‘Fueio "Jy суу pue voy ‘д шол) 
тора APPP 14H рир обр ор битрлооор suos4q GLEE fo uoungiysig 
THE WISVIL 


тш чы ee 


youd elqıpne 3sousirr 


7990098 sod suoyerqra puesnoq; * 
325 
2r 


326 К PRODUCT-MOMENT CORRELATION 


TABLE 14.2 


Showing the Number of Registration Districts in England and Wales exhibiting (1) a given 
Proportion of Male Births, (2) a given Total Number of Births during the Decade 1881-90. 
(The Data as to Total Births and Numbers of Male and Female Births from Decennial 


Supplement to Report of the Registrar-General. Table from H. D. Vigor and G. U. Yule, 
Jour. Roy. Stat. Soc., 69, 1906.) 


| (1) Proportion of Male Births per 1000 of all Births. 
: E 3 г DERE { i| i | g |Torazs. 
$ EIE: 3 aba GaGa ba Gab 
| A n 512 п 1 [оо 
А ЕНЕ = 2121212121218 |815 | 85| B | 2| 
E] EE | 
E 2 4| 8| 9 12| 21|14|12| 9| 5|3|1 |112 |—|1]|1 | 149 
[E — 2]|5| 7|90| 29| 42 136 118110) 14|—|—|—|1|—|—|—]| 204 
3 — == |721] SS 27| авл т |а| 80 
ES —|—|—]| 1/10 Күр ex | Lt а =| 
= —|—| 5| 6] 9| 4) 1| 1)—)—|—|—|—]—|—]|-] 26 
E == 1| 5| 4| 3) 1| 1)—|—|—|—|—|—|—|—| 15 
2 —-|-|-|-i23|5|s3|1|i|-|-|2|—2--—- — -—|-|-| x 
5 = 1 eL r^ eX LR E ES ess d an cai |t rs a E 
mE = = 
E = = сз = = б 
E ЕЕЕ ЕЕЕ i 
B = = Em 2| 5 | | 
а |—|—|—|— —|23| 6| 3 — —|— m — i 
© | 52- 56 |—|— —| 2| 5 — — = “7 
& | 65-60 |—| —|—| —|—}—] —| —| —| —| —| —]—] 1| — ge = = cx me 3 
В | 60- 64 = =) = xd eer = = = А 
= |61- 68 | — |-i— ===] 2) 8|— = = 3 
8 | 68-72 —|— —|—|—| 2}—] 1 = = = z 3 
ы | 79- 76 | — | — —|—-|— —|— — ues = = = 1 
© | 76- 80 — — — EE aer zs 
5 | 80- 84) —|—|—|— —|— — 12 am = = 1 
Е | 8 88 |—|—|—|—|—|—|—|—|—=|—|—|= Ap | SSS ншке ж = = e ee 
5 | 88-92 —|— = ҮМ (= et ш ИЕ) =) inea me уык. 5 
А | 92- 96 — — = = — — —|-|— — 
g | 96-200 | — = = — E 225) 2077 = em S 
3 |100- 04 | — | — = = = | GE = = е 
& |104- 08 —|— = = = 1 Em Е == Е 5 
a |M8-52|—|—|— —|— 23 exe 1 = |же =) = Т 
Tomis | 1 /—|1)—|—|2)2)2) 4} 6 11511846 |98 |125 [120188 |az|20 1311 4 (111113811. 032 


14.2. In accordance with the 


definitions of the previous chapter we may say that 
two variables are independent in a 


bivariate table if the observed frequency in the jth 
row and ith column (4,Bj), is equal to a 


In such a case, for any two rows j and k 


the frequencies will be proportional to (4)(B;) and (4,)(В,) ; so that the distribution in 
any row is similar and similarly situated to that in any other row. It will, for example, 
have the same mean and variance. Similarly for the columns. 

The measures of dependence we shall consider are 
and column means and variances differ among themselv: 
all row means are equal and all column means are equal. 
but becomes so in an important special case when th 


related to the extent to which row 
es. Ifthe variates are independent 

The converse is not true in general, 
e distribution is normal. 


3 


REGRESSION à 327 


TABLE 14.3 


Distribution of 1078 Sons according to (1) Stature of Father and (2) Stature of Son: One 
or Two Sons only of each Father. 


(From Karl Pearson and Alice Lee, Biometrika, 2 (1903), 415.) 


Measurements in inches. Note that where a height falls on the border-line of an interval, 
one-half of the individual is assigned to each contiguous interval. 


(1) Stature of Father. 
fas fs ; 5 : ERE в} е] е ба 
ОЕ р se led S| Slee es el ee 1 
| кк E ЧЕ нн E AT i ii| 
EE: с 5 а z E Е E | 2 E s 
зае |е ја[ вада { е [5а е Е s 
„| 595-605 | — — | 05 = 
Б] 60-5 15 — — — 
a 5-025 | Z 0-25 1 = 
є = 0-25 | 2 2 = 
E 1|—|rs |375 1 = 
3 2 [1 05 |2 9 = 
* — | 0-5 1 2:25 9. = 
E — |15 |2 |475 13-75 = 
—|— |15 |2 10 = 
= S Es T 
7 —|-—lz 325 У 
5 = 5 05 
8 = [х= = l= om 95 
73 Esc Mes bec Eas ms z 
a wt = = 0-5 
75 eese sees ИЕ S - 
76 ка г-ны С osea = 
77 к= ac | Sh | a Ыы =з = 
784 Е perd randi] м а e 
Torars | 3 35 8 17 |335 | 61-5 | 05:5 142 137-5 |154 141-5 110 78 49 28-5 |4 55 1078 
Regression 


14.3. The rows and columns will be referred to b 
we shall consider the two variates ж and 4 ‚ 
апа у vertically, ie. in columns. Consid 
OX and OY at ri 


y the general term “ arrays " and 
x being taken to vary horizontally 


; le. in rows, 
er then the means of arrays. 


Take two axes 


ight angles representing the variates ж and y. On this frame of reference 
Plot the points whose abscissae are the centres of the «-intervals and whose ordinates are 
the means of the corresponding distributi 


ons of y in the columns centred at the appropriate 
x's, 


Similarly, plot the means of the a-distributions against the centres of the corres- 


ponding y-intervals. (In practice it is useful to 
denoted by small cireles and those of y 


for the data of Table 14.3 and Fig. 14.2 for the data of Table 14.2 


y round smooth curves. For 
y lie approximately on straight lines, whereas in Fig. 14.2 one set of 


Such curves are called regression eurves and their equations with 
> are called regression equations. Tf the lines are Straight the regression 
48 said to be linear; if not, curvilinear or skew, 

To put these geometrically expressed ideas in analytical language, suppose the mean 


Son's Stature (inches). 


328 PRODUCT-MOMENT CORRELATION 

of x in the array centred at у; is Z; Then the points (Z; у) may be represented by a func- 

tional equation— | 
se „2 


€ = fly) : : : Н : . (14.4) 
which is the regression equation of x on y. If the regression is linear, 


2 = Ву +a. 
There will also be an equation | 


9 = ge), - 5 1 ` E . (14.2) 
the regression of y on z. 


Proportion of Male Births per 1000 births. 
480 450 500 R510 520 


o 
Father's Stature (inches). um +. -+- E 
62 bL R 66 G TO 72 +- 
63 Y || | 
o id | — x T 
MARN Е _ 


20 


30 


Total number of births (1000's). 


0з ү | 


9 
75 = R TEO R 
Fic. 14.1.—Regression Lines of Data of Table 14.3. Fic. 14.2.—Regression Lines of Data of Table 14.2. 
Means of rows shown by circles, and corresponding Means of rows shown by circles, and corresponding 
regression line by RR; means of columns shown by regression line by RR; means of columns shown by 
crosses, and corresponding regression line by cc. crosses, and regression line by CC. 


In this chapter we shall mainly be concerned with the case in which regressions are 
linear or very nearly so. 


14.4. In an observed distribution the means of arrays will not as a rule lie exact] 
on straight lines, or indeed on any simple curves, although they may be very near to d 
so. The question then arises: if the regression is “ approximately ” linear, what is the idt 
line to take as the regression line? The question may be answered by an appeal to the 
method of least squares, The regression of a on y, say 2 = Ву + a, will be determine b 
minimising the sum М 
p ХМ; E By; =a о (14.3) 


REGRESSION 329 


the summation extending over all y-intervals. Here №, Tere the cried PE 
ith row, and we note that X(N; 2;) is equal to the total frequency N times the mean о: 
the whole distribution. И 

From (14.3) we have, for the minimal values of « and f, 


LEE urwqu —dHupusd]edq es a dii 
до. х 
au es b: | qiie 
aj = BAN y(t, — By; – а) = 0 . P (14.5) 


Now choose the origin at the means of x and y for the distribution. Then EN;z; = 0 
and ZN; у; = 0 and hence, from (14.4) А 
ах = Ч. 
From (14.5) we then have 


ЖМ Yi Ti) — В (№, у) = 0. 
Now since the origin has been chosen at the mean of a and y, X(N;y;;j) = N cov (х, у) 
and X(N, у?) = N var y. Hence we have . 


_ cov (x, y). 
vary ` 


(14.6) 


The equation of the regression of х on y, taking X and Y to be current co-ordinates, 
will then be 


EU -. 0 —. ан 
var y 


Referred to an arbitrary origin for which the means of х, у аге #, 7, the equation is 


x a) = ©” Юу И x R7 [9 


Similarly we find for the regression of y on x 
y = cov (x, yY), x Е 
су ee —ьу, Ё А 7 . (14.9 
(Y —g)- TES — a) (14.9) 


Equations (14.8) and (14.9) are fundamental. If the regressions are exactly linear they 


give the regression equations ; if not, they give the “ best ” straight regression lines in the 
sense of the method of least squares. 


The Coefficient of Product-moment Correlation 


14.5. The coefficients 20У (v, 9) and < æy) are called regression coefficients and 
vary var a С 
will be denoted by Ву and f, respectively.* 
We now define 


\ 


(8,82) 
cov (x, y) 
(таг x var y) or = we — AA) 


p 


* There is little danger of confusion between this notation and th 
measures of skewness and kurtos; 


‘ e use of fi, By to indicate 
is. The two rarely 


occur in the same context. 


330 PRODUCT-MOMENT CORRELATION 


p is called the coefficient of product-moment correlation or briefly the correlation coefticient. 


It provides an important measure of the relation between two variates for which thes. 


regressions are approximately linear. In this expression the square root is to have positive 
sign. 


Let us note in the first place that p cannot be greater than unity in absolute value. 
For we have, taking an origin at the means and summing for all pairs of values of =, y over 


the. population, 
Z(y— Pry) = Ele?) — ?8,5(шу) + B3X(y?) 
is: — 28.202) , piXXy?) 
3 zen “sey me] 


= Xx?) — 28.8. + В.В.) 
—X(x2)1-—p2) . : Я 5 a 6 (14:11) 


Thus 1 — p? cannot be negative. 
Furthermore, if p = + 1, Z(x — f;y)* = 0 and hence every ж + буу = 0. Thus the 
‘variates are linearly related by the equation X — ВУ = 0. If p = 0 the regression 


equations become X = 0, Y = 0,forthen cov (z, y) — 0. Hence the means of arrays aro 
the same for all arrays. 


14.6. If p = +1 we say that the variates are perfectly positively correlated: if 
9 <p < l, that they are positively correlated ; if р = 0, that they are uncorrelated ; if 
0— р> — 1, that they are negatively correlated ; and if p = — 1, that they are perfectly 
negatively correlated. 

“ Uncorrelated " is not the same thing as “independent.” If the variates are inde- 
pendent they are uncorrelated, but not vice-versa. Table 14.2 and Fig. 14.2 illustrate this 
point. The regression lines, as shown in the figure, are close to X — 0, Y — 0 and the 
correlation is, in fact, very small (— 0-014). But the variates are obviously far from 
independence and if the data are grouped, in columns up to 494-5, by single columns up 
to 521-5, and over 521-5, and by rows 8-11, 12-13, 14-15, 16-17, 18-19, 20-21, 22-23, 
24-25, 26-27, 28 and over, the coefficient of contingency is 0-47. 


14.7. The calculation of the correlation coefficient in numerical examples requires 
that of the means and variances of the two variates and their covariance. The last is the 
only new type appearing, the others being calculable from border frequencies in the manner 
exemplified in Chapter 3. 


Taking an arbitrary origin, we have, if the means of z and у are a and b, 


N соу (x, у) = Z(x — a)(y — b) 


= (ху) — aX(y) — bX(v) + Nab 
= X(vy) — Nab 


1 
соу (x, y) = ў (жу) — ab. B : 2 T (14.12) 
Thus we may, 


as in the univariate case, take an arbitrary origin for arithmetical convenience, 
calculate the 


product sum 2(ey) and determine the covariance by the use of (14.12). ‘The 


THE COEFFICIENT OF PRODUCT-MOMENT CORRELATION 331 


i i rd's - 
calculation of the product sum is exemplified below. As seen in 3.30, no Sheppar 
corrections are required for the first product moment. 


Example 14.1 


To find the correlation coefficient and de a for nn UE ess 
i the border column У 
We find the means and variances from ә СЯ 
Аһ ма mean is taken for x (age) at the centre of the interval 20-5 АА T 
y (highest pitch) at the centre of the interval 19- thousand vibrations which may be ta 
о 
аз 19,995. We find 


(£) = 0-770,642 
(ж) = 2,604, par) 6 
Ely) = — 708, (y) = — 0-209,529 
2 (x?) = 47,392, шә(ж) = 13-348,229 
20у?) = 8894, (y) = — 2-504,904. 


To find the product sum X(xy) we require the product ay for each non-zero cell of the table. 
These products are shown in brackets in Table 14.1, Then we have, reading the table A 
from left to right and from top to bottom, 


Жу) = (1 x 0) 


t+ (1 x 14) + (1 x 84) 
X 18) + (1.x 12) + (3 x 6) + ete. 


— 12,535 e р c 
whence cov (2, y) = 2185888 — (0-770,642)( —0-209,529) 
= — 3:548,205. 
cov (x, y) 

Thus = ан =, 
, Р us) nly 

а substantial negative correlation. 

We also find 


— 0-6136, 
The highest audible pitch decreases with increasing age. 


fees cov (a, y) = 


— 1-417 
var y 

EP US). noes 
var v 


The regression equations, for the units of the table and with our arbitrary means, are then 


X — 0-7706 = — 1-417(¥ + 0-2095) 
Y + 02095 = — 0-2658(X — 0-7706). 


` Example 14.2 


The following device is often useful in calculating product moments. We recall that 
22 (жу) = (ж + y)? — Ze) — у?) 
= Zle?) + Z(y?) — (ж — y)?. 
Thus we may find X(xy) from either (ж + y)? or 2(x — y)?, and these quantities are often 
more convenient to calculate, 


_ For example, in the ргегейіп 
diagonals 


Taking а 


g example we note that x 
running from the bottom right-hand to the t 


-F y to be zero in the cell centred at в = 


+ V is constant down the 
Op left-hand corner of the table. 
20-5- and y = 19-, we may, in our 


332 | PRODUCT-MOMENT CORRELATION 


units, take it to be + 1 in the cell 23-5-, 19-, and -1 in 17-5-, 19-, and so on. If we 
sum up the diagonals we get— 


roy. Sum. zc y. Sum 
-9 1 4 124 
-8 1 5 112 
-7 5 6 80 
-6 11 7 59 
-5 20 8 38 
-4 93 9 23 
5) 207 10 21 
=) 434 11 9 
-1 594 12 9 
0 637 13 2 
1 418 14 4 
2 281 15 1 

3 185 
TOTAL 3379 


The total is 3379, which provides a check on the work. We then find the sum of squares 
in the ordinary way, obtaining 
Zæ +y)? = 31,216 = X(a*) + Lly?) + 2 (ху). 

(ху) = }(31,216 — 47,392 — 8894) 

= — 12,535 as before. 

The rest of the calculation follows the same lines as in Example 14.1. 

We should have obtained the same result for (шжу) if we had summed up the other 
diagonal. Which diagonal is chosen depends on how the frequencies lie in the table. 


Example 14.3 


'Then 


In the foregoing the regression lines and the correlation coefficient were arrived at 
from a consideration of grouped frequencies in a bivariate table. We may, however, apply 
the same ideas to ungrouped material. There are no longer means of arrays, but the 
regression lines are still to be interpreted as the lines of best fit to the N pairs of variate- 
values and the correlation coefficient as a measure of relationship between variates. 

Table 14.4 shows the yields of wheat and potatoes in 48 counties of England in 1936. 


In this particular case it is hardly worth while taking an arbitrary origin other than that 
given. We find (= = wheat, у = potatoes) 


E(x) = 758-0, m(x) = 15-791,667 
X(y) = 291-1, u(y) = 6-064,583 
E(x?) = 12,170-48, fo(x%) = 4:174,930 
X(y?) = 1791-03, u(y) = 0:533,958 
(шу) = 4612-64, Bul, у) = 0-326,888 
0-326,888 
P= \/(4:174,930 x 0-533,958) 
= 0-2189. 
В, = 0-6122, 


B, = 0-07830. 


THE COEFFICIENT OF PRODUCT-MOMENT CORRELATION 


333 
TABLE 14.4 * 
P Yields of Wheat and Potatoes in 48 Courties in England in 1936. 
21 7 

| Wheat | Potatoes Wheat | Potatoes 
County. (ewts. (tons County. (ewts. (tons 

per acre). | per acre). ; per acre). | per acre). 
Bedfordshire 16-0 5:3 Northamptonshire . 14:3 4:9 
Huntingdonshire 16-0 6-6 Peterborough 14-4 5:6 
Cambridgeshiro . 16-4 6-1 Buckinghamshiro 15-2 6-4 
I Сыз 20-5 5-5 Oxfordshire . 141 6-9 
Suffolk, West 18-2 6-9 Warwickshire 15-4 5-6 
Suffolk, East 16-3 6-1 Shropshire 16:5 6-1 
Еззех. ; 17-7 6-4 Worcestershire . 14-2 5-7 
Hertfordshire 15:3 6-3 Gloucestershire . 13-2 5-0 
| Middlesox 16-5 78 | Wiltshire. 13-8 6-5 
Norfolk . . . 16-9 8-3 Herefordshire 144 6-2 
Lines. (Holland) 21.8 5-7 Somersetshire 13-4 5.2 
» (Kesteven) 15:5 6-2 Dorsetshire . | 11-2 6-6 
3» (Lindsey) A 15:8 6-0 Devonshire . | l4 5:8 
Yorkshiro (East Riding 16-1 6-1 Cornwall а ж 15-4 6.3 
Kent . es 18-5 6:6 Northumberland 18-5 6-3 
Surrey б 12-7 4-8 Durham . FECE: 16-4 5:8 
Sussox, East 15-7 4-9 Yorkshire (North Riding). 17:0 5.9 
Sussex, West 14:3 5-1 a (West Riding) . 16-9 6:5 
Berkshire 13-8 5:5 Cumberland . "e | 175 58 
Hampshire 12-8 6-7 Westmorland > - | A88 5-7 
Islo of Wight 12.0 6-5 Lancashire š 19-2 72 
Nottinghamshire 15-6 5-2 Cheshire . # 17:7 6-5 
Leicestershire 15:8 5-2 Derbyshire Б 15.2 5-4 
Rutland . 16-6 Tl | Staffordshire , 171 6:3 

LT 


Potato Yield (tons peracre). 


14. l6 18 
Wheat Yield (смів. ber acre). 


20 


Fro. 14.3.—Scatter Diagram of the Data of Table 14.4. 


22 


334 › PRODUCT-MOMENT CORRELATION 


The regression lines are 
X — 15-792 = 0-6122 (Y — 6-065) 
Y — 6-065 = 0-0783 (X — 15-792) 
The data are shown in a graphical form in Fig. 14.3. Corresponding to each pair of 
values (v, y) there is plotted a point with those values as abscissa and ordinate. The 


totality of points furnishes what is known, for obvious reasons, as a scatter diagram. The 
two regression lines are also shown. 


The Bivariate Normal Distribution 


14.8. The distribution 
a ln DR О ш . (14.18 
= Баер ехр = sa s E T a dx dy (14.13) 
has already arisen (5.24) as the natural extension to two variates of the univariate normal 
distribution. In writing p in (14.13) we have anticipated a result which will now be proved, 
namely that p in that equation is in fact the correlation coefficient of the distribution. 
The characteristic function of (14.13) is 


$( м) = exp[— E сї + 2ulpc,c; Ta 202] 


whence var = = сї, vary = o3 
cov (x, y) = poids 
Lace tated 

and the correlation coefficient is УЕ] 08) = = р, as stated. 


The exponent in (14.13) may be written 


-z E ay +®а -» Pon Tan 
ai LE - Ey + wall = e v NIA 


Thus for any fixed y, æ is distributed normally about a mean given by 


2 py 
то. 
and hence the means of the w-arrays of infinite thinness lie on the line 
AC qp 
ТҮК E РИ L 
9; 9191 
and this is the regression of x on y. Similarly the means of y-arrays lie on 
YAPA 
at TOR ee E E 


the regression of y on ж. Thus the regression lines of the bivariate norm 
exactly linear. 


Furthermore, from (14.14) it is seen that the variance of an oo of x for fixed y is 


al surface are 


aill — p?), 
i.e. is independent of y. Similarly the variance of an array of y is 
o3(1 — р?) 


and is independent of z. 


ee на 


THE BIVARIATE NORMAL DISTRIBUTION А 335 


Thus the variance of x-arrays is the same for all arrays ; and so for y. Distributions 
for which this is true are called homoscedastic. Ale. 

If p — 0 the distribution becomes the product of two normal distributions and the 
variates are independent. Thus two uncorrelated normal variates are independent. 


14.9. А criterion for linearity of regression for the general bivariate surface may 
be obtained in terms of moments or cumulants. Taking an origin at the means of the two 
variates we have, if the regression of v on y is linear, 


T = py 
or, if f(x, у) is the frequency function, 


iy x f(x, y) dx = fiy ie f(x, y) dx. 


Multiplying both sides by y? and integrating over the range of y, we have 


| | y? x f(x, y) dx dy = A | y?! fla, y) de dy 


or Hap = Bi Mo, pat : . = i ‚. (14.18) 


In particular Har = Br Hog 


so that Шог Ш, р = Шу Цо, pis = 5 ` s - (14.19) 


& condition on the central moments. Recalling the definition of biv 


ty! tg ty? ty 


we see that the univariate mean moments are related to cumulants in the s 
Moments иу p to cumulants кур and hence we also have 


ariate cumulants 


ame way as 


Kip = В. коі > : za 2 - (14.20) 
Коз Kj p = Күү Ko, 44 s Е ‘ . - (14.21) 
Similarly, if the regression of y on x is linear we shall have 


Коо Шр, = Uy Hp41,0 
fao pii кү рр] 0 


+. . (14.22) 


Under certain conditions these equations are sufficient for linearity of regression. For 
Instance, if (14.18) is true for all p, then 


fnm lf. e — Bw) fe у de} =o, 


The expression in curly brackets is thus 


a function, not necessaril 
Vanish, and under certain general condi 


У positive, whose moments 
tions this implies that t 


he function itself vanishes 

Le. that | 
J. @- 0m fe у ао 

or 2 = py, 

So that the Tegression of ж on y is linear. 


` 


336 PRODUCT-MOMENT CORRELATION 
Example 14.4 (from Wicksell, Biometrika, 25, 126) 


Consider the bivariate distribution of the squares of variates x, y which are distributed: 
in the bivariate normal form. The characteristic function 
tional to 


Б = фр 
of these variates is propor- " 


CX e t 1 zx? 9px 2 
| | pe s 2(1 Y s P P e б id 


= сї 0105 


PR a Е 5 : 2pxy 
= өхр == | — 992(1 овур — £P 
ele 3 ila’ rut ee 


192 


sls va — 201 — prin} | dx dy. 
This is proportional to (compare Exercise 1.5) 


1 
L 
(1 — 20%(1 — p%)it} ae: à 
оү 0105 
P E 203(1 — pt)iu 
0103 o; ы EI 


which, except for constants, reduces to 
{1 — 20100)(1 — 2o3iu) — 4p?c?oS(it) (iu) )-*. 
This is the characteristic function, for when t = u = 0 it reduces to unity. 


Now the frequency-distribution represented by this function is evidently not normal : 
but its regressions are linear, for we have, taking logarithms, 


{е = — Ф1ор {(1 — 2o,it)(1 — 2oziu) — 4p?ojos(it) (iu) } 


giving, on identifying coefficients, 


and hence 


Кәр Kp,1 = Күү Kp41,9- 


Sampling of Regression and Correlation Coefficients 


14.10. We now turn to consider the sam 
of correlation and regression. 

First of all, as to standard errors. 
the determination of the samplin, 
the result for the normal case 


NCA 


pling problems associated with the coefficients 


In Example 9.6 on page 211 we have 


. anticipated 
g variance of the correlation coefficient itsel 


f, obtaining 


yarn = la manh Ў A А (14.23) 

n : dr 
Here, as usual, the Roman r is written for the value of p in the sample and т is the number 
in the sample. The result of (14.23) is not of great value, since the distribution of r tends 
to normality very slowly if p is not close to zero. It is probably as well not to use (14.23) 
unless n is greater than 500. 


SAMPLING OF REGRESSION AND CORRELATION COEFFICIENTS 337 


In the manner of Chapter 9 we have 


p, "и 

T Mis 
6b, dm, бт» 
- = — — —, 
bs My Meo 


giving 
var bs _ Var my, , Var Mao . 2 cov (My, Mao) 
"ex n mis тууту 


Fees Ы йаа 


Substituting from (9.16) and (9.17), and writing the sampling values m instead of the parent 
Us, we have 


Dimas mao іны.) eo. (14.25) 
Ties т (Fe T M3 MaMy 
or, for the normal case, on using the values of Example 3.15, 4 
b2/1 y? j 
var b, = a E ) 
n\ r? 
LASER EL o s . (14.26) 
n vara | 
Similarly 
verb = = OTE м; + & = | « (eon 
n var y 
To our order of approximation it is indifferent whether we write 1 — r? or 1 — р? 
on the right-hand side of these equations, 
Example 14.5 
In the data of Table 14.3 (height of fathers and height of sons) we find r = 
for n = 1078, From inspection of the table we see that 


close to the normal type, and in this case n is large enough to justify the use of the s 
error. We have then 


tandard 
_ (1 — (0-51)2}2 
MEA EE C 
= 0:000,508. 
Thus the standard error is about 0-023. The correlation is thus undoubtedly significant, 
if the data were obtained by random sampling. It is improbable that 


the parent corre- 
lation p lies outside the range 0-51 + 0-05, and very improbable that it lies outside the 
Tange 0-51 + 0-075, 


Estimates of Correlation. and Regression Coi 


14.11. In large-sample theory the sample values of the c 
Coefficients may be taken as estimates of the population v 
сап also be used in small-sample theory and it may, 
giving maximum likelihood to samples from a bi 

A.S.—VOL. I. 


efficients in Normal Samples 


orrelation and regression 
alues in the usual way. They 
in fact, be shown that they are estimates 
variate normal population, 


2 


338 PRODUCT-MOMENT CORRELATION 


The joint probability of n sample values (ш, y:) . . . (tm Yn) from a bivariate normal 
population with means m, and Ma is 


E^ x = n L E =a = "ү 2р © т) — m.) 


(22)^ o, 0,5 g SE 


+ (0) а, dy, .. . da, dy, (14.28) 
05 


The likelihood function may then be written 


Le 1 гер | — gd — 2B + С} | . 0499) 
с1"021— р?) (EI 
and thus, for the maximisation of log L we have 
1 8L 1 m P 0 
Lom гуу ale — md + zu n) 
giving C) P xu deg hice ы л аёо 
сі сз 
HE 1 aL 
Similarly from Dionem 0 we have | 
Ва) аду т) о. . . .04) 
61 о» 
Thus from (14.30) and (14.31), p not being unity in general, | 
X(« — ту) = Х(у — т.) = 0 rf 
ту = (ж) 
1 L8 18 29) 
Ms = = M) 


so that our estimates of the means are the means of the sample. 


We also find, equatin до: log L, ao log L and 2 log L to zero and cancelling factors 
8 ome E ap Б 8 


P. 
in оу, оз and (1 — p?) respectively, 3m 
—n+5 Maa ny. SG | 
=P 
1 | 
P трт РВ + C) — 0 M s 2 . (14.33) 
1 
zu d _(А — 2B +C А 
TETE E po 
whence pe 
i | 
joe Or 
п т 


| 
4 


DISTRIBUTION OF COVARIANCE IN NORMAL CASE 339 
giving for the estimates of c?, c2 and p 
а Ewe — mj) 
n р 
RUE PPP 


De 5 
3 = L(y — m, 
im w 


| 
(ж — ту)(у — m) | 
) 


_ (Qe m) X(y—m, 2} 


p 


which, on substituting the estimates of m, and m, given by (14.32), become the sample 
variances and correlation coefficient. 


Distribution of Sample Means, Variances and Covariance in Normal Samples 
14.12. In accordance with the result of the previous section we will take аре ао ее 

of the parameters m, Ma, о?, оў and p to be the corresponding sample values €, J, si, sf 

andr, The joint distribution of the sample values is given by (14.28) and it is remarkable 


that the exponent in that expression can be expressed solely in terms of the five parameters 
and their estimates. We have, in fact, 


x( — my б зр (© — mj)(y — ms) " xt — y 


о, 0105 Oa 


r — & + — т)? zane 
L2 0220 152 лыу + two similar terms 
с 


— m2 a 2 у 
=o n) + ze 8 - four similar terms, 
оү оү 


the product terms vanishing because X(x — 2) = 


ES Ly — 9) = 0), 
= ae mam + two similar terms 
оү 


=n{E =m) g, E—m)g—m) | g- na 


o1 0103 оз 
2 9 2 

mim The. у S „5 i CODI 
оү 0183 05 


We proceed to find the joint distribution of the 
to express the frequency element (14.28) in terms of 
that element is given by the exponential of (14.35). 
form the volume element dz, . .. dx, dy, .. . dy,. 

Generalising the geometrical approach of Chapter 10, we may 
n dimensions, n for a and n for у. The sample point may vary 
V-space, but not independently ѕо. In fact, if P represents the p 
*-Space and Q the point (y, . . . Yn) inthe y-space, and if О 


five statistics, and to do so require 
them. The non-differential part of 
It remains to express in the requisite 


imagine a sample space 
7 in the z-space and the 
oint (ccm ta) in the 


» Ozare the points (@... 4), 
Y ©. . 4), then for any given r we have 
rel’ —5—4) - Iggy 9) 
7818 Gv — &)? Sy — yy 


and thus у is the cosine of the angle, say 0, between PO, and ОО», so that if P and r are 


340 PRODUCT-MOMENT CORRELATION 


fixed Q varies on the cone in the 
made with O,P is constant. 

The element in the z-space is proportional to s,"-? ds, 
For given r, ӯ and s, the point Q varies on the zone of the hypersphere of radius say”, 
centre j and (» — 1) dimensions. This zone has radius 5,4/n sin 0= s,4/n (1 — r2)! and 

М» d ; е T зуру 

width sayn 10 - S and thus its content is proportional to (s/n — r2! y? 
S; /n dr r =f) n-i 
—— that is, to s,"-2(1 — 72) T, 
(С яуа is, to 8, “( т?) 

Thus the volume element may be written 


y-space obtained by rotating 0.0 such that the angle 


di as was seen in Example 10.5. 


А n—4 
dv ос s? ds, dE s,-? ds, dj (1—r2) 2 dr 


n—4 
ос si"? 5,9—2 de, ds, (1 — r?) T dr азар . - . (14.36) 


element of the five variables is then proportional to 
n (S — mj)? (2 — т,)(ў — mj) , (9 — m4? 

Ex 2 2 
oe) gi alt И ка 


and the joint frequency 


z 
er 9103 05 


2 2 
+ E — 9998 y 2] dv (14.37) 
oi оё 
This fundamental result is due to R. А. Fisher (1915). 


14.13. One important property of (14.37) may be remarked. The distribution may 
be factorised into two parts, one containing only 2 and 


ӯ and the other only sı, s, and 7, 
namely (except for constants) 


í n ( — m)? —», —mj)(y — т»), (F — m? ү 
dF œ exp Е 2ü ES a 2p mol Ar a di dj . (14.38) 
and 
= n si . 2prss, , si Bus ye M d 
dF oc exp [ N= {2 oe, -3 s 8,—(1 — r2) T ds, ds, dr . (14.89) 


Thus we see that in normal samp 
that of variances and covariance. 
Before leaving (14.38) we may 


les the distribution of means is entirely independent of 


also note that the means are themselves distributed 


2 2 
ад : = = Е оў cy О; 
in the bivariate normal form, with mean (@) = m,, mean (ў) = ma, var (8) = a var (ў) = a 


(all of which results are already familiar), and 


соу (z, ў) = ^! Tes Я А > К + (14.40) 


so that the correlation between z and 9 is p, the correlation in the parent population. 


14.14. We may now use (14.39) to obtai 


n the distribution of the correlation co- 
efficient, namely, by integrating with respect to 


sı and s, from 0 to оо, Let us first of all 


evaluate the constant to be attached to it from the consideration that far = 1, 


wa 


са 
"n 


H 


DISTRIBUTION OF COVARIANCE IN NORMAL CASE 


Make the variate transformation 


ne 52 n 
of 20 =p) 
TSS. — 
SS = В a 
0102 "20 D) 
E ЕН т 
ei E — р?) 
We have for the Jacobian of the transformation 
a E === 0 0 
ої 2(1 — p?) 
ala, b, с) 753 n $185 n TS1 n 
9(51, Р, $&) | 00. 2(1— P) oc 2(1 — р?) oc (1 =p?) 
9 
0 0 a n 
e$ 2(1 — p?) 
=. sisa n? _ даст 


20103(1 — p?) 0103(1 — p?) 
and also the relation 
b2 

Р =. 


ас 
The integral then becomes 


| exp[ — а + 2b — с]. pem ZÈ Ve {жш = ers 


п n 


A n—4 
x ( hk ep} da db de 
ac 2acn 
M 9n—3 g,n-i 2=1 (1 — „зүз—1 
= 1 == p?) IET а + 2pb — c] (ac — by UR db dc 


Where the limits of а and c are 0 to co 


and those of b are + yac. 


4 2 
evaluated in terms of the T-function. Putting § = a — e we find 
c 


b2\ n—4 п-4 
Ө ТЕ 2pb —с =) c? dédbde, 0< £< 


= 5 =) f exp |- {(b — pc)? + (1 = pyet} [es 2 db de 
= (tu = "va [esc (1—pjeje T de 


ad) Tg) 


341 


‚ (14.41) 


. (14.42) 


This integral may be 


©, —c x«b«o,0«c«co 


‚ (14.43) 


342 PRODUCT-MOMENT CORRELATION 


Collecting up the terms from (14.42 


) and (14.43) we find for the joint distribution of sı, 
8, and ғ L 
nici n 8 2prsiS, , s$ 
a СЕЕ. 
" 1-1 2 = Ат — р?) Lo? 0105 oi 
20,73 07-1 (| — p yr * Tin 2) 
Sree ste? (Tamers: ut m ds, dr (14.44) 
Now put 
$183 д log ЕД ыг. 
0105 182 


_S2_ ER 0 
0105 0103 
ager) | 1 : 
ss вт) | 2 ж, ake | 
0 0 1 
2 
57 су 


The exponent in (14.44) becomes 


бхр Е EI р?) (be — 2pr£ + te~ 3] 
and after a little reduction the distribution becomes 
n-1 n -4 
dF = а ехр |- a £ j (cosh z — me di dz(1 — nu dr. 
z(l— p*| 2 I'm — 2) P { 


On integration with respect to ¢ we have 


n-4 
(L— pF Tin — 1 (1) 

um z (n — 2) 
Putting — pr = cos 0 we have, since 


(cosh z — pry-i de dr. 


[si : 
o cosh z + cos б — sing’ 
n—1 


йл АШ рр mis d"-? 0 
al'(n —2) e ye а= cos Q)"-? \sin 0 й 


0 руз Due did 
да) С) 


This is as simple a form as can be given in t 


cos 1(— р Dar 
dfr) =] va — p?) 


erms of elementary functions. 


14.15. In the particular case p = 0, (14.45) reduces to 


dF = 


1 fid 
TA 91 (1 Б). ы d . (14.46) 
2 D 


] want 


DISTRIBUTION OF COVARIANCE IN NORMAL CASE 343 


a form surmised by * Student ” in 1908. This distribution provides a test of the hypothesis 
that an observed r arose from an uncorrelated normal population. А Its distribution function 
may be obtained from incomplete B-functions, or more conveniently by putting 


= "AEN vi(n — 2) ы NECI) 


Which transforms (14.46) to 
1 dí 


—2 1 iol 
va — 2) B(" E з) ( +5) | 


The integral of this function has been tabulated and is given as Appendix Table 3. 
Fisher and Yates have also tabulated some of the significance points of ¢ and of r itself, 
le. the values of r (for various э) for which the distribution function takes specified values. 


dF = 


SENS) 


14.16. The general distribution (14.45) has been studied in some detail, but lack 
of space prevents the inclusion of the extensive analysis involved. We will here indicate 
only the more important features of the results, 

First, as to the shape of the frequency curves. When n = ә the distribution of (14.45) 

9comes nugatory because of the factor T(n — 2). This is understandable because, for 


samples of two, r must be either + 1ог — 1. Insucha case, then, we have а discontinuous 
distribution 


JO-—h гете у РАИ . (14.49) 


We may regard this as an extreme case of a U-shaped distribution, 


When n = 3 we find 
1 0 cos 0 1 я 
у = у, Lass — = per NL LP ey 


again a U-shaped distribution. For n= 4, 


Vo 


Qi 
y sin? 0 


{0 — 3 cot 0 + 30 cot? 0). uk MAS ss 
If P = 0 this reduces to the rectangular form y — у. In other cases the curve is J-shaped, 
increasing from a minimum at r — — 1 to a maximum (but not an infinite maximum) 
at r = +1. 


For n > 4 the frequency curves are unimodal a 


though slowly. Some interesting photographs of mo 


nd tend to normality with large т, 
“Co-operative Study” (1917). 


dels of these curves are given in the 


14.17. The moments of the distribution are expressible in terms of h ег, letri 
L | /pergeometr 
functions, Returning to (14.44) let us Write dus wu 


2 — 2 2 
g= i- p?) ga 981 —p Л 


344 PRODUCT-MOMENT CORRELATION 


After a little Tearrangement the distribution becomes 


n—1 
US ae lst 
с zl'(n — 2) Bap 2g 


т (с) dr . — . (1452) 


8 8 Я р 818. 
=—, ш. = and expanding the term in exp СЭ, we have 
01 ©» 90105 


n-4 
(1 = p? i 2 2) qt,172 y n—-2 aor 
аР = a(n — 3) P (— diui — Aud) ur- uy (1 — r?) 


eo j 
y S (prusus) с . (14.53) 
TEL | 


from 0 to co we find for the distribution of uù, and r 


Integrating for u, 


n-1 
Ji AE ES (pru, (* 1 2) 2 tt 
ap — (=p)? — du2)u,-2(] — ddr. УАР qd ITI 
F E) exp (— 3ujyu^-*( nu u, dr Л 3 


x 


Multiplying by r and integrating from — 1 to +1 we find 


n-i Ww А $ : n4-2j—2 
Q — pna ы i д (pu)! (n+ 2j\ p/n —2 a 4 3 ^ me 
а Г — эў ХР (— bul) u"? du, x 2 Fo 3 )50— Er 


and bis integrating with respect to u, we obtain 


Е (пр У putt x) Eam г(" +] 
s) al(n —2) & ЕП, 2 в( 2 yr (4 


ering that 
Г@) Te + 3) = с Ties), 
we find 
РП — рут ref” st 
PIA fi 4 ndn рг + 3n0n + 1). даа + 1) p! G7 
ras sj re af J inu GG Dame pay gt. 
2 2 
p — гут ref” 
= rere) Fn, ап, 3n + 1), p?) 
2 2 
and since F(a, В, у, a) = (1 — 27—80 Py аур, У, з) 
р Г? 3) 
glr) E E 3 Г gp ‘) F(3, 3, a(n + 1), Pp?) = . . j à e (14.55) 
IE om : 


DISTRIBUTION OF COVARIANCE IN NORMAL CASE 345 
In a similar way it may be shown that 


a — 2 2 5 
(7) —1—(1—p a =F F(1, 1, 1n + 1), p?) Jo $ - (14.56) 


These series converge fairly rapidly for moderate or large n. 


14.18. The ordinates and distribution function of the correlation coefficient are not 

expressible in terms of simple mathematical functions. They have, however, been tabulated 

. by David (1938) for values of n — 2(1)25, 50, 100, 200 and 400; for p = 0-0(0-1)0-9 ; and 
or * = — 1:00(0-05) + 1-00, with finer intervals in places. 


For many practical purposes it is sufficient to use a transformation of the distribution 
due to Fisher (1921). Putting 


r=tanhz, z= 1087 


. * s ‚ (14.57) 
р = їапЬ й, ¢= 1 ана 


We may expand the frequency function of y in powers of z — £, = а say 
of n. Fisher gives the following expansion : 


E n—2 vets "m 2+) es a ү d 
t= Festa руе ў E doo + (ith po ee 


, and inverse powers 


jfi dcs рв 4 + 30°, шы ў 4 + 12р? + 9g S — 2p? + 3p! 
р Аса D ав a 128(n — 1)? 64(n — 1) 
Sra Шр Г ЧА =i) Gaa 
+ 138 а* + li w(n — 1)) 4 388 7 d. - + (14.58) 


Taking moments about y = 


0 we find, on transferring to the mean, 


cce rae rud 
f= at Teo ЫЕ ) 


2 (14.59) 
і p.E—p' , 176 — 2102 91 
PA E № "(в — 1) 48(n —18 К>. |o (14.60) 
„е ы 
i a Са SE Lg 
С 51 224 — 48р? — 3p | 1472 — 29852 _ 141 pt — 398 
ка C= m * йе F 32(n — 1)2 dese ] - (14.62) 


The remarkable thing about the transformation i 


S that the distribution of ^, which is ver 
Skew, becomes the distribution of г — С, which 


S y 
18 nearly symmetrical. Tn fact, 

yi B mei —ma$)-L i. Я А - (14.63) 
_ 82 — 3р4 


me 16(% — 1) dues . + (14.64) 


346 PRODUCT-MOMENT CORRELATION 


Thus we may take z — £ to be approximately normally distributed with mean and 


variance given by (14.59) and (14.60). As a slightly rougher approximation we may take | 


y pie =a Шз, 0S „(5 
var (z — ¢) = І 4 Sie pa 


& —1 2(n — 1)? 

which is approximately equal for small p to 
NE » 

i=l" G1] 


= 1: approximately . . . . + (14.66) 


When n is moderate we may take a still rougher approximation by assuming z — ¢ to be 

H 
normally distributed about zero mean with variance AUS 
various approximations are given in the introduction to David's tables, and it appears 
that for n > 50 the forms (14.65) and (14.66) are adequate. The approximation given 
by (14.59) and (14.60) appears to hold satisfactorily for values of n as low as 11. 


‘Some comparisons of the 


14.19. Except in the case of the normal parent very little exact knowledge is available 
about the sampling distribution of the correlation coefficient. 
empirical evidence to justify the use of the above results when 
differ very much from the normal. E. S. Pearson (1931), in d 
menta] results, concluded that ** the results suggest that the nor 
mutilated and distorted to a remarkable degree without affect 
tion of r.” The subject does not seem to have been in 
in special cases. 


There is, however, some 
the population does not 
ealing with some experi- 
mal bivariate surface can be 
ing the frequency distribu- 
vestigated mathematically except 


Example 14.6 


The question then is, can such a va 
the yields of wheat and potatoes 
From prior knowledge of cro 
normality in the parent populat 
in this population is zero. 
We have 


p yields we can assume with some confidence 


) approximate 
ion. Let us then test the hypothesis that th 


e correlation 


е = = 0-1491. 
vin — 3) 4/(45) 
The deviation z — ¢ is thus 0:2225, or about 


1-49 times the standard error, This is not 
very improbable and the observed correlation - 


may thus be accidental, 


A 


- 


DISTRIBUTION OF REGRESSION COEFFICIENTS IN NORMAL SAMPLES 347 


Example 14.7 


In a sample of 50 a correlation coefficient is found to be + 0:5. What is the proba- 
bility that a value equal to or less than this should have been obtained from a normal 
population in which the correlation is + 0-7? > , 

The exact value, from David’s table, is, to five decimal places, 0-01289. Let us first 
of all take the approximation which assumes z — ¢ to be distributed about zero mean with 


variance . We have 
(n — 3) ds 
r Z 

= = log 5 50:84198 

р D 

(= dog; = = 08073 
WR, „бузоу 
v(n — 3) 


The deviation is thus 2-18 times the standard error, and the required probability, from 
the table of the normal integral, 0-0146 approximately, compared with the true value 
of 0-0129. The approximate test is not quite stringent enough. 


Let us then take z — € to be distributed normally about mean 55m = 0:00714. . 
The deviation is then — 0:3251, or 2-23 times the standard error, giving a probability of 
about 0-0129, almost the exact value. 


Example 14.8 


In a sample of n, there is observed a correlation of r, and in a second sample of n, 
а correlation of r} Are the sample values 7, and r, compatible with the hypothesis that 
the samples arose from the same population ? 

Suppose the hypothesis were true, and that p is the correlation coefficient in th 


e popula- 
tion. Then ifz, = tanh-! 7,, z, = tanh-! Ta, б = tanh! p, 


we know that if the population 


were normal, z, — ¢ will be distributed approximately with variance 


=з апа z, —¢ 
with variance т z Thus the difference z, — Za = (z1 — б) — (2, — б) is distributed 
= 
Approximately normally with variance 
ГР 
3  m,—9 


and this will provide a test of the hypothesis, 


Distribution of Regression Coefficients in Normal Samples 
14.20. Turning again о equation (14.44) we have, substituting b, = To the joint 
1 


fre T WES Aa Р 
queney-distribution of Sı Sa and b, 


dp 29 т Si — 9psib gu ж v $2052Nn—4 
oc exp [ NES s = СЕР + E 5, 15,2 ( — ы) 2 ds, ds, db, (14.67) 


348 ^ PRODUCT-MOMENT CORRELATION 


Integration with respect to s; gives for the distribution of s, and 5, 


2 2ps2b SA 
dF x exp [ zu E E Pigs Rive SS es ds, db. 


0105 


A further integration with respect to s, gives for the distribution of b, 
10, 
а. АБЕ 
nace 2pa, OF M 
( а b, Si an)? 
ГА [71 
or, on evaluation of the constant, 
It (ea — pas 1b 
ай» 
= .68 
dF vat (®@—2))\on-2 [9a _ 9% +(o, Р" . (14.68) 
л 5 А oi ( P T 2 сі 


The distribution of the regression coefficient b, is obtainable by interchanging the suffixes 
1 and 2. 
The form (14.68) is a Pearson Type VII distribution, symmetrical about the point 
b= m, the population regression coefficient. It tends to normality fairly rapidly, and 
1 
the use of the standard error for regressions is therefore valid for lower values of n than 
in the case of the correlation coefficient. For small samples, however, (14.68) is not of 


much use since it depends on the unknown quantities c1, c; and p, i.e. the population variances 
and covariance. 


14.21. It is possible to find statistics other than b, and b, which will provide a test 
of the regressions. Write 
= (bs = Ва). 
ш = ($i — bast $ i E Е . (14.69) 
We now return to the distribution of the quantities a, b, c of equation (14.41), namely, 
n—4 
dF c exp [— а + 2pb — c] (ac — b?) 3 da db de . ; . (14.70) 
We have from (14.69) 
Lear 
у(ас — 02) 


and on substituting for с in (14.70) we have, after a little reduction, 


2 
exp E (1 + ral da du 
dF œ K 


Ги? 4-1 b? 
TES .exp | s (200 =) |e ра)" ? db. 
The integral of the second р: 


art on the right for b will be found to give a factor proportional to 
2, 2 n— 
exp ap?(u? + »( au? ye 
u? и? +1 
and hence for the distribution of a and u we find 
n—1 
ар ос 2 9Xp(— а + р?а) da du 
5 5 


Quy 


(is 
H, 
| 


DISTRIBUTION OF REGRESSION COEFFICIENTS IN NORMAL SAMPLES 349 


Hence the distributions of а and ш are independent, and for that of u we have 
du 


о . (14.71 

ат с к= Е * Я ( ) 
(1 + w?)2 

This distribution does not contain any of the parent parameters. If we put 


=" (ba — Вз) (n — 2) 


. (14.72 
t = пу(п — 2) (ez — Ту А ( ) 

then ¢ is distributed in “ Student’s ” form 
dF œ . (14.78) 


and may be tested accordingly. 


Example 14.9 | ' 
-In Example 14.3 we found for the regression of Y (potato yield) on X (wheat yield) 
(Y — 6-065) = 0-0783 (X — 15-791). 
The regression coefficient is small. Could it have arisen from a population in which there 


is no correlation, i.e. in which pe == OF 
From Example 14.3 we have 


b, = 0-0783 /(n — 2) = 6-7823, s? = 4-1749, 52 = 0-5340. 
Hence from (14.72) 
basi (n — 2) | 
v (sè — s{b3) 
Appendix Table 3 does not carry us as far as y = 46. From the Fisher- 
however, we have the following values of ¢ for P = 0-05: 


: pes 2-00. 


Yates tables, 


у = 40 t = 2-021; v = 60 ё = 2-000, 
and for P = 0-02: 
»—40 1—2423; »=60 1 92.390. 


Thus in our case P evidently lies between 0-05 and 0-02, and the regression may not be 
significant, 


ie. the two variates may be independent. This confirms the conclusion 
reached in Example 14.6 from consideration of the correlation coefficient, 


14.22. Up to this point we have c 
Measure of the relationship between tw: 
mainly concern us in this and the succe 


onsidered the correlation coefficient mainly as a 
o variates, and this is the standpoint which will 


eding chapter. We may, however, turn for a time 
‘to a consideration of the regression equations, which have an importance of their own. 
Assuming that the regression is approximatel 


y linear, we have two equations 
X —$ =p (Y — y) 
fY—-g-B4x—af - 


‘expressing the relations between the means of variate arrays and the v. 
mining those arrays. 


: A problem which frequently presents itself in praetice is the following : 
aoe à member of the population exhibiting a variate-value x, what is its y-value ? Evidently 
ere is in general no unique answer to this question. For any given x there will be an 


‚ (14.74) 


ariate-values deter- 


350 PRODUCT-MOMENT CORRELATION 


array of y's, any one of which might be exhibited by the member under consideration. 
But in the absence of any special knowledge it is reasonable to take as the best estimate 
of y the mean of this array. If the population is normal the mean will be the modal value, 
and if it is approximately normal the mean will be a reasonable estimate, the greater part 
of the population values lying distributed within a range of two or three times the standard 
deviation of the array. 

In fact, the question as put is too restrictive. "There is no unique value of y cortei 
sponding to a given x, and we are entitled to enquire only after the distribution of y’s or 
their principal characteristics. ^ 

Now the mean required is given by the regression equation, and hence that equation 
may be used to estimate the y-value corresponding to a given x. If at the same time the 
variance of the y-array can be determined, the probable limits of error of the estimate 
may also be assigned. This is particularly easy for normal populations because, as we 
have seen (14.8), the variance of all z-arrays is oj(1 — p?) and that of the y-arrays zi — р?). 
As usual in large samples, we can use the sample values to calculate these variances; ог 
we may take the variance of the array direct from observation. 


Example 14.10 
In Example 14.1 we found for the regression equations, in the units there employed, 
-X — 0-7706 = — 1:417 (Y + 0-2095) 
Y + 0:2095 = — 0-2658(X — 0-7706). 
Suppose we require to estimate the highest audible pitch for a man 34 years of age. In 
our units this corresponds to an z-value of $(34 — 22) = 4. Our estimate of y is then 
— 0:2095 — 0-2658(4 — 0:7706) 
= — 1:0679 units. 
This corresponds, in vibrations per second, to 


19,995 — (1-0679) x 2000 
— 17,900 vibrations. 
The variance of the estimate is s?(1 — r?) 


= 13-3482(1 — (0-6136)?) thousands? 
— 8:322 thousands?, 
so that the standard error is 4/8:322 = 2-9 units — 5-8 thousand vibrations. The estimate 
is evidently not very accurate, for the value of y can vary within two or three times this 
range without very great improbability. 


If the problem had been set in the reverse form: what is the age corresponding to 
a vibration of 17-9 thousands, we should have 


X = 07706 — 1-417(— 1:0679 + 0-2095) 
= 1:99 units 
= 27-98 years. 


This is not very close to 34 years, the age from which we started запа in general, if & is 
the estimate of x, given у = 7, 7 is not the estimate of Y, given x = & We have a right to 


expect such a concordance only when r is near unity or when ё and y are near the means 
of the distribution, where the regression lines intersect, 


THE CORRELATION RATIOS 851 


The Correlation Ratios 4 
14.23. For any bivariate distribution we have, if %, is the mean of the pth х-аттау 
апа 2 the mean of the whole, 


= X(x — &)? + 200 — zy. л : . (14.75) 
©) (3 E ishi ause z, — 2 is constant for any given 
the product term 22(a — 2,)(@, — 7) vanishing because i, 
array. | 
The correlation ratio of v on y, 2, is defined by 
. 0 2(8, — 7)? 
Vv = Y уз 


and similarly that of y on 2, myx, by 


SEE X- ату) 


з L2.—9) 5 NOE) 
ae Х(у — ӯ)? 
Analogously to (14.75) we have 


= (ж — 8) Elta — Ву). . А ‚ (14.78). 
But, from (14.11), with an origin at the mean, 
Xs — Ba? = (1 — рз) (а — at, 
and from (14.76), the result remaining true for an origin at the meam, 
(1—952,)2(x — 2)? = X(x— Tp)? 
Taking these results in conjunction with (14.78) we find 
L(x — 3)*(j*,, — p?) = Zë, — Bay. | 
Hence 7 cannot be less than P. If and only if 7 =p, č — By vanishes for each 
array, ie. the regression is linear. 4? — r? may thus be used as an index of linearity of 
regression. 
Example 14.11 


The calculation of the correlation ratios is based on equation ( 14.76). As an illustration 
we will find those for the data of Table 14.1. The means of the horizontal arrays and the 
array frequencies are shown in Table 14.5. 


TABLE 14.5 
Calculation of the Correlation Ratio "ay for the Data of Table 14.1. 
Highest 

Audible Pitch Frequency Mean &, &р — 8 (p — ж)? 
5- 3 4666,667 3-896.025 15-179,011 
7- 45 9-111,111 8:340,469 69-563,423 
9- 10 9-700,000 8-929,358 79-733,434 
Nile. 104 8:817,308 8:046,666 64-748,834 
13- 93 6-333,333 ? 6-562,691 30-943,531 
15- 310 3-022,581 2-251,939 5-071,229 
17- 576 1-064,236 0-293,594 0-086,197 
19- 1051 0-101,808 — 0-668,834 0-447,339 
21- 957 — 0-801,463 — 1:572,105 2-471.514 
23- 165 — 1-278,788 — 2-049,430 4-200,163 
25- 41 — 1-512,195 — 9.282,837 5-211,345 
27-- 16 — 1-562,500 — 2-333,142 5-443,552 
29- 2 — 1-000,000 — 1:770,642 3:135,173 
31- 2 — 3-000,000 — 9-110,642 14.217,741 

33- 4 — 1-750,000 — 2:520,649 ` 


6-353,636 


352 PRODUCT-MOMENT CORRELATION 
We have already found that 


X(x?) = 47,392 (ш) = 2604, gu 
from which 
1 
(ш — £)? = (а?) — x Qe 


= 45,385.25. 
From the table we now have 
X(z, — 7)? = 19,095-88, 
It should be noticed that in forming this sum we multiply each (, — 4)? in the last column 


of Table 14.5 by the corresponding frequency in the second column, for the summation 
takes place over all values of x. 


We then find 
19,095-88 
tu = 22-5255 = 0-420,751, 
7!  45,385-25 
giving 7,, = 0-6487. Similarl 
coefficient is — 0+6136. 
We have 


y it may be shown that Ty; = 0:6231. The correlation 


N sy — т? = 0:044 
Nye — r? = 0-012, 


These values are close to zero and the regressions are thus approximately linear, 


14.24. We shall see in the next cha 


pter that 7? is closely related to a statistic R, the 
multiple correlation coefficient, w 


hich is of rather greater importance. We accordingly 
defer a full discussion of the sampling distribution of 7? until that chapter, but will here 


derive it in the special case of samples from an uncorrelated bivariate population. 
From (14.75) and (14.76) we have 


XS o 9 ж x (Am 
T У 2) an 


. Now if the population is normal and the arrays are of narrow width, the distribution 
in each array will be normal. We have already seen that in a normal distribution the 
mean is distributed independently of the variance. Hence 200 — &,)?, which is the Sum | 2 
of numerical multiples of array variances, is independent of the array means and hence of T 7 
the quantity X(z, — z)*. Thus the numerator and denominator of (14.79) are independent, | 

Further, if the variates are uncorrelated and therefore (in the normal case) independe tis 
the distributions in parent arrays have all the same mean and variance, those or the t 1 Y 
distribution. Without loss of generality we may take the mean to be zero and the vari otal > 
to be unity. Р - 

It was seen in Example 10.5 that the sum of squares of « variates, each distrib 
normally with zero mean and unit variance, is given by ributed 

dF oce 3 eo dt р: 
i È э Yül 

and that the distribution of sum of squares about the mean is the same in form aay 1 4.80) 
index of t reduced by unity. Now 2(« — 2)? шр оуег а given ae кое the 
р members is ће sum of squares about the mean of 35 rer es is thus аш 
in the form (14.80) with JV, — 1 degrees of freedom, tha Y with  —ibuted 


"er, 


ance 


DE. 


. be obtained from the incomplete B-function. 


THE CORRELATION RATIOS 353 


Thus the sum of (= — 2,)° for the whole array will be distributed in the form (14.80) with 
(Np — 1) = N — p degrees of freedom, i.e. as 
i dF ceo» 9d, .  . (1481) 
The mean ë, will be distributed in the normal form 


dF oc e i» is di, 


and consequently pS (čp — &)?, which is equal to ÈN — €)* (the summation now ex- 


р А 
tending over ће р arrays), will be distributed in the form (14.80) with p — 1 degrees of 
freedom ; i.e., writing w for the sum, as 


dF oc e 1 140—3) dy, T s s . (14.82) 
2 t T 
1 = - 
To find the distribution of E we then have to find that of р i and ш being inde 
pendent. 9 
We have for е joint distribution 
dF occ exp [— 1( + w)] € 7»-9 w-d qt qo, 
Put ¢= + C=t+u. 
The Jacobian of the transformation is 
aE, t) tcu 


a(t, и) u? 


L^ w^ a S) 


and (14.83) becomes 
gKN-p-2) ES 
(1 + EN- > 
Thus € and £ are independent and we have for the distribution of È 
EIN —p-2) 


партр - e . o  . (48) 


AF ссе (0-3) qz 


dF oc 


T — 72 
whence, on putting € = c» we find 


"X »-3 
аР ос (1 — 7?)KN—P—2) (92) $^ d(n?) 
1 ЕРЫ We 
= IN ролт т de) (j=) даун + (14.85) 
М — — 
в(? p p *) 


Zu 2 


Which is the distribution required. 


14.25. The distribution function of (14.85), which is a Pearson Typ 


e T curve, may 
It is sufficient for ordinary 


purposes, however, 
to use the tabulated forms of Fisher's z-dis 


tribution (Example 10.18). In fact, putting 
in (14.85) 
т —p—1 
7 = № – р 
ee — ый ыг Ур 
1= 172 ә —1 
4.8.—VOL. I. ы 


354 PRODUCT-MOMENT CORRELATION 


ЖЫ 
the form of equation (10.62). Appendix Tables 4 and 5 give the values of z, such that equal 
or greater values will be attained with probability 0-05 and 0-01. These tables are due 
to Fisher and reproduced from his Statistical Methods for Research Workers. In practice, 
however, 7? is only calculated for large values of N outside the range of these tables, and 
we may either use the approximation suggested therein or special Tables by T. L. Woo 
reproduced in Tables for Statisticians and Biometricians, Part I. 


14.26. It is easy to show that the first two moments of (14.85) and the constants 
yı and y, are given by 


go ex i 
| т==г == 5 1 А А ; ч E Е 5 . (14.86) 
p e A E E E a . (14.87) 
(N — DN +1) 
(po HUE HEP EU PICS 
- (p — 1)(N — р) + 3)? 


_ 12(N? + N*(4 — 5p) + N(5p? — 12р + 6) + 7p? тр + 1} 
7 (p — ТУ — pXN + 3007 + 5) 


. (14.89) 
Thus, to order N-!, 
2 8 | 
= 
Ср А. о. ааа ар 


12 
Аа = 


and thus 7* does not tend to normality for large N for any finite number of arrays p. 


таниб T 

4.27. We now proceed to consider two coefficients designed for the measurement 

of dependence and based on the product-moment correlation coefficient, tetrachorie r and 

biserial 7, Both those coefficients are, in effect, estimates of a putative product-moment 

correlation for data which are not specified with the detail of an ordinary bivariate table. 
Suppose we have a fourfold table 


a | b a+b 
c | d Coes ES RE ec Ж ТОДО; 
| = шише 
a+c | b+d N 


If this table is derived by a double dichotomy of a bivariate frequency-distribution 


E 1 КЫ Qpary 2 
2 = 2 exp = ( LPY Y 
2(1 — p*Áo? — oo, EN оў 


x» 


= б- 


TETRACHORIC r 355 


we may ask, what is the value of p in terms of a, b, c,d and N? This problem is, in fact, 
determinate. 


If the population is normal the array totals will be normal, and thus the frequencies 
(а + c) and (b + d) correspond to a dichotomy of the normal curve, i.e. there exists an 


h’ such that 
ЛОЗЕ s а + с 
Í | Zae dy 


7 


= ah EG! 
iS [sem dy — ў 


1 a { la BEES 
uen s ЖЕ хр 4 — 2 bdr = 
oF Sal E a N 
d 


1 о 122 T 
——| exp4— }— bdr = 
ata z { о N 


ERR . (14.92) 


Putting 2 = 4 we have 
01 


n exp (— 12?) dx = (a + с)/@л) 


N 


: J . . (14.93 
Š (— 10°) dy = (b + d)4/(2z) ( ) 
|, exp 54 а е 


80 that % can be derived from the tables of the normal integral. 
Similarly there will be a № defined by 


ү exp (— dy?) dy = B+ Oven), 


We then require to solve for p the equation 


d 1 um pe Же җор, 2 
S 7 sa), |, ео [пее tmm] s oaa 


p?) 
We will expand the integral on the right in ascending powers of p. The characteristic . 
function of the distribution is 
p(t, u) = exp (— 101° + 2ptu + w?)}, 
Thus 


d — 1 e 4 © Б —ilz—iuy 
ў = s dep af № plt, uje di du 


1 oo oo со о о p РЕ j 
= — (a Зу Ке уд, (— р)? 
zi def af f exp (— (2 + u?) — ite — iuy} 2 1—17 at du (14.95) 


The coefficient of (— py is the product of two integrals, the first of which is 


lf9. pe -— 
ly apo exp (— 3? — ita)! dt 


and the second a similar expression in k,y and u. Now fr i i 
„си: р y u ow from 6.24 the integral with respect 


(= 3 Hye). eae 


2856 PRODUCT-MOMENT CORRELATION 


and hence the double integral is 


leg 5 Кл gee Na 2 
[c 101 # H, (ш) Va E J : 
Hence, from (14.95), 
ENS DUIS ауы 
MSi i—1(h).H; cm eie, 
In the notation of 6.27 we write z for the tetrachoric function of h and т’ for that of k, 
and we then have 
d SU a 


The tetrachoric functions have been tabled up to ть (Tables for Statisticians and Bio- 
metricians, Parts I and П) and, with their aid, (14.96) can be solved by successive approxi- 
mation. Examples will be found in the introduction to the Tables. 


14.28. It is to be realised that tl 
equation (14.96) is not a product-moment correlation 
pina bivariate normal population. It із not an estima 
in non-normal populations, Its practical use is limited la 


Karl Pearson (1913) has given expressions for these quantiti 
the distribution of tetrachoric r it is not clear how far the use 


Biserial а 
14.29. Suppose now that we have a 2 x q-fold table, the dic 
to some qualitative factor and the other classification either to a 
some variate permitting the arrangement of the classes in order. 
Table 14.6 will illustrate the type of material under, discussion, 


hotomy being according 
numerical variate or to 


The data relate to 


.TABLE 14.6 


Showing 1426 Criminals classified according to Alcoholism and Type of Crime. 


(C. Goring’s data, quoted by К. Pearson, 1909.) 


Т ; 
Я | | 
Arson. Rape. | Violence, | Stealing. | Coining. Fraud. Torars 
| 
Alcoholic . . . , ,| б 88 155 379 | ış 63 753 
Non-aleoholie . , , TE 3 62 | 10 | 300 | 14 144 673 
Torars | 93 150 265 | 679 | 32 207 | 14% 


1426 criminals classified 


according to whether they were 
the crime for which the 


ing tc alcoholic or not and 
y were imprisoned, The order o 


according to 
f the crime-classificat 


lon is deter- 


d 


359 
BISERIAL 7; 


М ч ы 4 же 
mined by its relationship with intelligence, arson being associated with low intelligence a. 
fraud with high. 


If the population is normal, n =p. We have 


_ SN, Gp — 9)? 


1%. = N vary 
= yNp 9° = 2N, Üp Gt N, g^) y 
a N vary А 
-y(r ds E cc! « 537 ae 
CAN vary] vary 
3 N, Ip y ED y NE NL = g? " 
Since x (s vary) Nvary ^? Ўр = var y 
Thus Ё 3 
"- l,N,gJ, var yy _ 9? 


2 г 5 - (14.98) 
N vary, vary vary 


But the mean variance of arrays, weighted according to the numbers in arrays, 
= var y(1 — p?) = var y(1 — 22). Taking this as equal to var yy We have 


N, Fp? g2 
72 = (1 — р?) [| Зо | _ 1 
| зл) (s var =) var у 
2 хоту. qt 
NO Nvar y, var у 
TETAS 
N vary, 
The use of this expression lies in the fact th 
m the data on certain assumptions. If we suppose that the quantity according to which 
dichotomy has been made (in our example, alcoholism) is capable of representation by 
а variate which is normally distributed, and thus that each y-atray is a dichotomy of 


^ normal curve, the quantities —~— and —Yo . can be obtained from the tables of 
A/ var у Vvar Yp 


For example, the two frequencies 


giving y = 


= «© a  « (18:99) 


at the quantities in it can be estimated 
fro 


the normal integral. alcoholic and non-alcoholic are, 


Thus the proportional frequency in the alcoholic group is a = 0-5376 


o be 0:0946, 


for arson, 50 and 43. 


and the deviation corresponding to this frequency is seen from the tables t 


Which is thus — 2 — is y-array, 
ich is thus Vvary, for this y-array, 


Example 14.12 


For the data of Table 14.6 the proportional frequencies, the values of 7» 


arg and 
the N, are as follows :— : 

AM Arson. Rape. Violence. Stealing. Coining. Fraud. Toran, 
Alcoholic А . . 0:5376 0:5867 0-5849 0:5582 0:5625 0:3043 0-5281 
i vvar y, i > 0-0944 0-2190 0-2144 0:1463 01573  — 0-5119 0-0704 

p-> š 5 93 150 265 679 


32 207 1426 


p 


* 356 p. 


and hep-^ 
" d 


PRODUCT-MOMENT CORRELATION 


p 
4en from (14.99) we have 


1 
93(0-0044)? + . . . } — (0-0704)2 
TE Taag Oal je a y mi ) 
[s 1 
—— {93(0-0944)2 +... 
l-t X226 09(0:09 PSE } 
giving : = 2 — 0-05456 


n = 0-234, 


which, on our various assumptions, may be taken as approximating to the supposed product- 


moment correlation coefficient. 


As for tetrachoric 7, the sampling distribution of biserial 7 is unknown. Expressions 


for its sampling variance have been derived by K. Pearson (1915), but are to be used with 
considerable reserve. 


14.30. Something may also be said about the assumptions on which tetrachoric r and 
biserial т are based, particularly that of normality. In supposing that a given fourfold 
table is the double dichotomy of a normal population, we are assuming that the attributes 
or variates concerned are capable of representation on a normal scale and that it was, in 
fact, this scale which determined the classification given. This assumption is evidently 
a considerable one and cannot always be made with much confidence. In dividing criminals 
into alcoholic and non-alcoholic it would, for example, be assumed that “alcoholism ” 
is a quantity which varies continuously from one subject to another; or perhaps that 
propensity to alcoholism was such a variate. At one end of the scale we should have 
chronic inebriety, at the other the most austere teetotalism. It would be further assumed 
that if the degree of alcoholism could be measured, the population of criminals would be 
distributed according to the alcoholic variate in a normal form ; and it would be further 
assumed that the data which are given would have been arrived at by a dichotomy of the 
population according to the variate. How far assumptions of this kind are justified depends 
on previous knowledge and the circumstances of individual cases; but even so it remains 


largely a matter of personal opinion. The reader will meet widely divergent views in the 
literature of the subject. 


AUT ap Correlation 
4.31. There sometimes arise, mainl 


the correlation between me 


1 work, cases in which we require 
to examine the correlation 


es. We might, for example, wish 

The question then arises, which 
st case we might have a number 
on table has two variates, both 
brother is to be related to which 
> ; rother first, or the taller brother ; 
but this would provide the answer to differe ions, the correlation берт iden and 
horter brothers; not the correlation between 


In the simple 
Our correlati 


The problem is met by entering in the correlation tab 
obtained by taking both brothers first. Generally, 


le both possible pairs, i.e. those 
will be (Е — 1) entries, each member being tak 


if the family contains members, there 
en first in association with each other 


H 
H 


INTRA-CLASS CORRELATION x $ 359 


member second. If there are p families with k, ka... kp members there will be 


ре 
\] 


e + 
7724 D Е, — 1) = N entries in the correlation table. Аз a simple illustration consider five . 
i=1 


families of three brothers with heights | 
Ist family 69, 70, 72 inches 
2nd family ПА РАЛ, 
3rd family 71.572112: 5 
4th family 65:970: оО 
5th family ЧҮЛ ЧӘ УТӘ S 


There will be 30 entries in the table, which will be as follows :— 


| 


Height (inches). 


i} | | | 


68 69 | 70 71, 72 73 Torars. 
68 — — 2 m Ei 2 
69 — — | 1 — .| I — 2 
© | | | 
8 70 2 1 | p i > "Je 8 
S 
£l- | | 
2 71 = = 1 E. 4 1 6 
E 
Ф - = 
E 
72 = T 4 | 4 2 1 10 
| | 
| 
73 — — — 1 1 — 2 
Torars 2 2 8 6 ТОИ 15 30 
| 


Here, for example, the pair 69, 70 in the first family is entered as (69, 70) and (70, 69) 
and the pair 72, 72 in the third family twice as (72, 72). 

The table is symmetrical about the diagonal, as it evidently must be. We may calculate 
the product-moment coefficient in the usual way. 
cov (xy) = 0-516 and hence p = ae = 0-301. 
| F 
| The actual compilation of such a table is, however, both tedious and 
| The coefficient p can be found by direct methods, as follows :— 

Suppose there are p families, with variate-values ds 


We find varv = var y = 1-716, 


unnecessary. 


| i : + Tk, X * € ak v о е 
E x Am e сае numbering ki ke . . . ky. In the correlation table each оак 

н à family will appear k, — 1 times (in association with the oth d 
* family), and thus the mean of each varia — le 


te is given by 


i 


шт муч д 
an Web obi DE ЖЕ АН ee. о) 


360 ` PRODUCT-MOMENT CORRELATION 


the first summation taking place over the p families and the second over all members of 
the ith family. Similarly 


var х = var y = yi үа 1)2' (ау, АВ iA (14.101) 
and соу (xy) = SPI — g)(zg — &), jan А . (14.102) 
the тшш extending over all possible pairs for which j = l. Thus the coefficient p is 
given by i 
55 (ty — Sa — 2) | 
Руы а 707 

This сап be thrown into a rather more convenient form. We have 

Z У (ш — р — 8) = EF (ty ба ®) — FF a — gy 


(14.103) 


(where the summation X' now extends over all possible pairs, including j=l) 
il 
= У Ke, — 8) — ХХ (ay — 2) . . - (14.104) 
= id 


*; being the mean of the ith family. 

Thus р 
Z (8, — z)? — Хх (ug — 2)? 
Я 14 


Elk, — 1) E (m — 8) 2 d Р 
i j 


p= . (14.105) 
Tf all the families have the same number of members this formula is somewhat simplified. 


Denoting by v the variance of x, and by v,, the variance of means of families (about the 
mean 2), we have 


__ pk, — pkv 
(6 — 1)pkv 


=r- 1) «и» д О 


The coefficient р is called the Intra-class Correlation Coefficient, to distinguish it from 
the ordinary product-moment coefficient. 


Example 14.13 


Let us use formula (14.106) to find the intra-class coefficient for the example of the 
above section. With a pure: mean at 70 inches, the values of the variates are 


E050: 2: ОАО, 1093200 — 2,105.07 1,2,8; 
= 13 р 1 37 
H ; EO Lus pe oe 
се тыл 4 15V I) 05. } 5 
386 
var (x) = 225 
The means of families are 
15 25 —10 30 


<q 


N= 


INTRA-CLASS CORRELATION ` . 361 


and the deviations from # 


E89 “ло 259017 
152 156? ЛО 5.7 6: 
1f/8N2 
Thus on =H as tete] 
. 1030 
~ m25 
Hence, from (14.106 
: ( ) . 1[3.1030.225 
P = 3) 1125.386 


= 0-301, 
a result we have already found directly. 


14.32. One caution is necessary in the interpretation of the intra-class correlation 
k= r 
it may attain + 1. Itis thus a skew coefficient in the sense that, unlike product-moment 
correlation and association, a negative value has not the same significance (as a departure 
from independence) as the equivalent positive value. 


coefficient. From (14.106) it is seen that intra-class p cannot be less than 


though 


14.33. The sampling distribution of intra-class p for the case of a normal population 
and equal numbers in families may be obtained as follows :— 


It may be shown, precisely as in 14.24, that the ratio of two sums of squares about 
Ui 


means, é = based on N — p and p — 1 sums, is distributed аз 
2 
4(N—p—2) 
аР ос ела de so da Seo rss 
1+ =) 
91 


provided that the sums are independent and emanate from normal populations. Here 
оў, o? are the population variances relating to v», v, respectively. 
Consider now р families of } members, pk in all, as p samples of k from a normal popula- 


tion in which the intra-class coefficient is 4. Writing Z for the sample intra-class coefficient 
we have 


A. ME k 1 
= Tae . А . . e (14.108) 


ates to means of samples and is distributed independently 
m 
of v — vm, as in the case of (14.79). 


v—v 
where € = 77 % Now Vm rel 


We may therefore substitute for € in (14.107), with 
pk and p — p. Furthermore, since the population value of v — Um is of and thatof 


7 as 
Um 18 Ep we have 


1 ko? 
А = = 
mi {eae 1} 


y hg 
05 — o 


i 
Gaia EN к а. ^а) 


362 м PRODUCT-MOMENT CORRELATION 


After a little reduction (14.107) becomes 3 
» (:—1)-2 2-3 
1-0 2 1+lk—1)}2 dl 110) 
ap A=)? раць урт. EN iud 
(1—2-rFAk—1)ü—1)2 


As for the product-moment coefficient, this form may be brought closer to normality by 
putting 


1 = ќапһ:2, 2. = tanh С. 
In the particular case k = 2 we find, writing n for p, 
e-* dz 
cosh^-i(z — £) 
which has the remarkable property of depending only on z — £, i.e. of being the Samen 
form for any £ or 4. Writing z — £ =x we may derive from (14.111) the expansion 


2 2 Баз " 
к, f lah: exem) |p dz = 4 a зь ud Ex. Х|: (14.112) 
giving г эб = mie d = p*ee \ ot (ay Bev oad iis Л; 
р у 
ia = = L E 4 "n ERO 1. Dep j . (14.114) 
hum Xp ne is шап) 
= Е: jl == 5 iet is Dp ) Те) 
whence yı = = . . . . E + (14.117) 
Ys m t wo ООО E TJ ЖАЫ; 
illustrating the tendency to normality. z — ¢ may be taken to be distributed normally 
about zero mean with variance approximately. 


(в — 3) 
For the general case the substitution 
2(k — 1) =k — 2 + k tanh (z — 0) 
2(k — A — k — 2 + k tanh (¢ — 0)f ` 


—2 
where tanh 0 = 4 ҮЛЕК reduces (14.110) (with n written for p) to 


г(® = 5) 
dF = 


еер == exp { 
QF QE Dort = =) 


nk-1 
X (sech(r —0)) z du Р < (14.120) 


- — . (14.119) 


(k — 2 1, 
Dg 2 


where, as usual, y = 2 — Ѓ. 


virt 


NOTES AND REFERENCES 363 


NOTES AND REFERENCES 


The classical theory of product-moment correlation, beginning with Galton and Karl 
Pearson, was established by Yule (1897, 1907). The sampling problem for the normal савр 
was solved by Fisher (1915) and studied by subsequent writers, culminating in Miss David’s 
tables of 1938. For experimental work on the sampling distribution see E. S. Pearson 
(1931). А method of deriving the distribution alternative to the geometrical approach and 
relying on characteristic functions has been given by Kullback (1935). 

Tetrachoric р and biserial 7 are both inventions of Karl Pearson s, but the tetrachorio 
Series has been discovered by many writers, priority apparently being due to Mehler (1876). 
Tor controversy on the nature and scope of tetrachoric p see references in previous chapter. 

Intra-class correlation is formally equivalent to a linear function of the ratio of two 
variances and thus becomes a branch of quadratic analysis (analysis of variance) which will 
be dealt with in the second volume. 


Co-operative Study (1917) (H. E. Soper, A. W. Young, B. M. Cave, A. Lee and K. Pearson), 
“ On the distribution of the correlation coefficient in small samples," Biometrika, 
11, 328. 

David, F. N. (1938), Tables of the Correlation Coefficient, Cambridge. University Press. 
Fisher, В. A. (1915), “ The frequency distribution of the values of the correlation coefficient 
in samples from an indefinitely large population,” Biometrika, 10, 507. 

— — (1921), “ On the probable error of a coefücient of correlation deduced from a small 
sample," Metron, 1, No. 4, 3. i 
Kullback, S. (1934), “ An application of characteristic functions to the distribution problem 
of statistics,’ Ann. Math. Statist., 5, 263. 
Mehler, ©. (1876), “ Reihenentwicklung nach Laplaceschen Functionen Һӧһегег Ordnung,” 
J. fiir Math., 66, 161. 
Pearson, К. (1909), “ Оһ а new method of determining correlation between a measured 
characteristic A and a character B, eto." Biometrika, 7, 96. 
—— — (1910), “ On a new method of determining correlation when one variable is given by 
alternative and the other by multiple categories," Biometrika, 7, 248. 
—— (1911), “ On the correction necessary for the correlation ratio 
and (1923), 14, 412. 
1913), “ On the probable error of a coefficient of correlation as found from a fourfold 
table," Biometrika, 9, 22. 
— — (1915), “ On the probable error of biserial 7," Biometrika, 11, 292. 
D Obes E. S. (1922), “ On polychoric coefficients of correlation," Biometrika, 
; 127. 
Pearson, E. $. (1931), “ The test of significance for the correl 
Statist. Ass., 26, 128, and (1932) 27, 424. 
. Pearson, E. 8. (1932), ibid., 27, 121. 
aen A. (1918), ** The correlation coefficient of a polyehorie table,” 
Yule, G. U. (1897), “On the theory of correlation," Jour. Roy. Statist. Soc., 60, 812. 
hn et Dn fie оу of correlation for any number of variables treated by a new 
ation," Proc. Roy. Soc., A, 79. 182. 


т, Biometrika, 8, 254, 


m 


ation coefficient," Jour. Amer. 
See also Cheshire, L., Oldis, E., and 


Biometrika, 


364 PRODUCT-MOMENT CORRELATION 
EXERCISES 


14.1. Show that the data of Table 1.25 have the following constants (v = age, 
у = milk yield) : 


mean ж = 6-22 years mean y = 18-61 gallons, 
V(varz)—2:21  ,, A/(var у) = 3-37 3 
p = 0-21 


9, Nry = 0-242, т. = 0-266. 

14.2. Show that for the data of Table 14.2 
р = 0:012. Жы, 10:14, Ty; = 0:38. 

14.3. Show that the smaller angle between the regression lines is 


l—p? бо. 
алабата Me 
P о +0 


14.4. Ifa bivariate normal surface is dichotomised at its medians and « is the pro- 
portional frequency in the positive compartment of the 2 x 2 table so generated (i.e. the 
compartment including the limits + co), show that 

р = cos (1 — 2о)л. 


(Sheppard, Phil. Trans. Roy. Soc., 1898, 192A, 101.) 


14.5. Show that the ordinates of the sampling distribution of the correlation coefficient 
т in samples from a normal parent with correlation p obey the recurrence relation 


" 2n —1 n —1 
2 
n+ a % 


+ d — yn» 
where n is the sample number and 
a Pr V — p?) VU — r9) 
52 2.8 D 
І — pi 


@ = өз) — тз) 
> Z . 
І р? 
(Co-operative Study, 1917.) 


14.6. By the transformation cosh z — pr = a 


Т 6 show that the ordinate of the 
distribution of r may be expressed as 


n—1 n—4 
т —2 (1 — р?) 


j= E Ти = 1) 
n 2n—3 7 
y (27) (ey In — 1) 
I dau л: a? .82, a? i } 
| Ba} ° 22.0206 pny 32.989 @ — 1)% + 304 3) : 
where х = EC E j 


14.7. Show that the characteristic function of 
Xx? pXxy бе= (у?) 


im 1 — p*)o? б (1 = p?)eios d 2(1 — p?)o$ 


A 


\ 


EXERCISES 365 


in normal samples is 

n 
(Lp). — 

{1 —й)1 — its) — p*(1 + tt)?” | 

where #, refers to 0,, and so on. Hence show that the distribution of variances and i 
авв has the "des characteristic function, except for constants, but with the epa 
of » reduced by unity. Show that the simultaneous distribution of these quantities is then 
that of equation (14.42) with б, = na, б, = npb, 0з = nc. 


(Kullback, 1934.) 


14.8. From the distribution of equation (14.42) show that the distribution of 


v= aie and r is given by 

оу] Og n—4 
SUM PY dri 
ФР e Maro + vij 
Integrating for r from — 1 to +1 by putting 
ulh + p) — (1 — à — и) 
T7 wA + uw) + —u- a) 

Show that the distribution of v is 


А=1 +v, р = ро, 


n—l 
2(1 — р?) 2 pn-2 4p?v* He 
a aT E 1— 2 
Ee в(" = 1 n= 5) (1 + vmi (1 +02) 
DE г 


4 


(This gives the distribution of the variance ratio when the variates are correlated. 
The result is due to S. S. Bose (1935), Sankhya, 2, 65. The derivation was given by 
Finney, Biometrika, 1938, 30, 190.) 


14.9. Show that in samples from a normal bivariate population the variance of b, is 
given exactly by ; i 
оў 
2) = ———(1-—p? 
var (bs) = те 4 (1 — p) 
and that for the distribution of ba 


14.10. By considering the joint distribution of s, and b, in normal samples, show that 


the regression of b, on s, is linear, but that of s, on b, is not linear and does not tend to 
linearity for large samples. 


14.11. Writing the bivariate frequency function in the form 
Fæ, y) = fe) gely), 
80 that the jth moment about the origin of the y array for given x is 


ie = | 


© 


dy y? 9,(y), 


366 


show that 


PRODUCT-MOMENT CORRELATION 


Е u) 


oui 


© 


= ep. da e f(x) p(z) 
u-0 ET 


(where ¢ is the characteristic function of the distribution) so that 
f(x) uj) = (ay Е ents 9/$ dt. 
d 2л; my, Qui u=0 


Verify that the bivariate normal distribution has linear regressions and is homoscedastic. 


14.12. (Data of E. M. Elderton, quoted by K. Pearson, 1910.) The following table 
shows 811 sons classified according to alcoholism of parent and health of son :— 


Son. 
n т 
Fairly | - Phthisical or Died TOTALS. 
Healthy | healthy. | palisate, epileptic. young 
| 
Ж | Alcoholic . 122 9 | 24 8 42 205 
o | 
£ ; 
Non-aleoholic NEP 37 71 37 133 606 
1 
| 
Torars 450 46 | 95 45 175 811 


Show that biserial y = 0-089, indicating little correlation between health of son and 


consumption of aleohol by parent. 


14.13. ` (Data from О. Н. Latter, Biometrika, 4, 1905, p. 363.) 
The following table shows the length of cuckoos’ eggs fostered by various birds :— 


Length of Egg (units 3 millimetre). 


Foster Parent. 40 | 41 42 | 43 | 44 | 45 | 46 | 47 48 | 49 50 | Torats. 
: | 
| 

Robin 1 1 8 3 9 | 18 | 20 А! а 2 2 16 
| | 

Wren 7 5 | 14 8 9 6 3 2|—|—|— 54 

Hedge-Sparrow — | — 2 5 | 14 | 13 | 13 3 5 | — 3 58 
| 

TOTALS 8 В 24.| 16 5392 | 32 | 864 11 1 16 2 5 188 


Show that the coefficient of intra-class correlation is + 0-22. 


14.14. A series of measurements are subject to 


supposed uncorrelated with the magnitudes of the m 


errors of observation which may be 
If xı, y, refer to the 


easurements, 


|| 


EXERCISES 367 


observed deviations from arithmetic means and z, y to the true deviations, show that 
(жуу) = S(ey), but that var т, > var x; var y, > var y. Hence show that the observed 
correlation is less than the true correlation. 


14.15. If three variables X,, X., X, are uncorrelated and the deviations are small 
: Xi. 
compared with their mean values M,, M, and JM, show that the variance of — is 


Xs 
Approximately 


ps X, 2cov(X, X.) , var a) 


WN MI МҮМ, T M3 

2 PTT 

and that the correlation between zt and x. is 
3 3 
E vs 
PS VG + ONE + 
where 02 = Yat Xi etc. 
labs x 


Note that this is positive, so that there is a “ spurious ” correlation between the two 
CODE d X 
indices =! and 2°, 

3 X; 


CHAPTER 15 
PARTIAL AND MULTIPLE CORRELATION 


15.1. The product-moment coefficient of correlation ¢ 
last chapter, be used to measure the relationship between two variates which are distributed 
either exactly or approximately in the normal form. When we come to interpret such 
a correlation, however, we meet the same sort of problem which arose in Chapter 13 in 
connection with associations: if a variate 1 is correlated with a variate 2, may this not 
be due to the fact that both are correlated with a variate 3? The question may be decided 
by considering the correlation of 1 and 2 in the sub-populations for which variate 3 is 
constant, and in this chapter we consider the theory of such partial correlations, which 
bear an obvious analogy to the partial associations of Chapter 13. The subject may best 
be broached by extending to several variables the theory of linear regression developed 
for two variables in the previous chapter. 


an, as has been ‘seen in the 


15.2. Suppose, in fact, that there is given a set of N individuals considered according 
to p variates a, ta, . . . жь, so that to each individual there correspond p variate-values, 
We may, for example, be given a set of men according to height, weight, age and income, 
or a set of counties according to wheat-yields, hours of sunshine per annum, inches of 
rainfall per annum, and mean height above sea-level. In general, any variate may be 
considered as dependent on the others and for any variate, say vı, we may require to find 
the “ best" linear relation of the form 


X, = æ + В,Х, + В.Х, + 2: Ё,Х 
a generalisation of (14.8). As before, the constants 
of least squares, i.e. so that 


. - (15.1) 
may be determined by the principle 


U = а — a Ваа... Ва,)а. 


- (15.2) 
is а minimum, the summation extending over the N members of the population, үү, 
shall then have i С 
aU 
ee —2(-—a«—f:,—...— Бш) = 0, e (15.3) 
and if we take the variables measured from their means, this reduces to х = 0. With 
this convention we have (p — 1) equations of type ap = 0, i.e. 
Р k 
Со Су =. a Вх) = 0 
Or 
Gov Ki) — Ps Gov (Ey). — Ay vir. — 2. =p, cor (nins 0, 
These (p — 1) equati ъа Е е 
ese (р — 1) equations can be solved for the (р — 1) quantities 6 and hence the regui 
form (15.1) is determinate. 3 ye 


15.3. In the notation introduced by Yule we write 


X,= x i 
1 12.34... 2 ML RE a IIO + В,.28. рр Кр, è (15.5) 


с 


PARTIAL AND MULTIPLE CORRELATION 369 


which is the regression equation of X, on X,... Xp referred to the means of the NR 
The quantities f are called Partial Regression Coefficients. The first subscript to the le 
of the period in each f is that of the variate on the left of the regression equation, and 
the second subscript is that of the variate to which it is attached. These are called Primary 
Subscripts. The subscripts on the right of the period are those of the remaining variables 
and are called Secondary Subscripts. Н Ж b 
When no confusion is likely to arise we can write (15.5) in the simpler form 
X; = В.Х, Paie tT Bp ns d . . . 3 (15.6) 
i rst pri И dary subscripts. 
that is to say, we may drop the first primary and the secondary 
The du of the primary subscripts is material, #5 being different from В, p; but 
that of the secondary subscripts is not. 
Write x 
21.93... = X1 — Pi2.st. p $8 — «4. — D15.23...(p-1) Eps . . (15.7) 
21.3... may then be called the residual of v, of order p. It is the difference between 
the observed x, and the value given by the regression equation. If all the residuals are 


zero, and only in this case, the regression is exactly linear. The P's were determined so. 


аз to make tlie sum of squares of residuals a minimum. 
Write also : 
7 =o (15.8) 
var (Жү оз...) = Ооз... . . . . . E 
So that c5. „ is the standard deviation of residuals and corresponds to the standard 
deviation of arrays considered in 14.22. 


15.4. From (15.7), equations (15.4) may be written 


20а. а әз.) = 0, £—2... p E : : + (15.9) 

ànd generally we shall have 
200. $j39. 4 1941...9) = 0, j = k, . : - (15.10) 

ie. the covariance of any residual and any variate is zero, provided that the subseript of 

the latter occurs among the secondary subscripts of the former. 

More generally still, ' 
X(%1 34.0» 19.34...) = {ү за... (=. — B23.4...p NR ew == Dap.sa, . p 105) } 
ànd each term on the right vanishes in virtue of (15.9) except the first, so that 


221.34... V2.34...p) = 2001.31.62) . . + (15.11) 


= (v, % з...) . З . (15.12) 
by symmetry. 

Thus the covariance of any two residuals is una 
Secondary subseripts of either which are common to 
any residual with p secondary suffixes and a residual with those p secondary suffixes and 
7 additional ones is unaltered by adding to the former any of the g of the latter. 


As a corollary, any covariance is zero if all the subscripts of one residual occur among 
the Secondary subscripts of the second. 


ltered by omitting any or all of the 
both. Conversely the covariance of 


15.5. In virtue of these results we have 
0 = 2 (2%. 54.» 21,25...) 
= Z (25.34... (a — 12.34... ==, sit 
= (хә 34, ту) — Py2.34...» DICA Xe) 
4.8.—yor, т = (жм...) — Вуз.за...р 200.0), 


370 ~ PARTIAL AND MULTIPLE CORRELATION 


and thus, writing g for the group of suffixes 34 . . . p, we have 


_ COV (21.0, 25.4) 15.13 
[JU хауа (жәл) . ES E . = (15. ) 1 
a generalisation of (14.6). 
Similarly 


CoV (24.4, 29.0) 
n= Se 
var (2.0) 


.  « (15.14) 


We may then define a coefficient рә = P12.34 ...» by the equation 


P12.q = (Bi2.9 Boy 


7 COV (201.0, 29.0) 7] 6 
{var (v2.4) тат Guy М ° z s ‚ (15.15) 


This is a generalisation of-(14.10). рзд is evidently the product-moment coefficient of 
correlation between tıg and =, 


15.6. From p variates we can pick out two in Ө ways and find the regression of 


2 


ND 
each on the other and their correlation ; we can also pick out three in (5 ways and find 


the regression of each on the other two; and so on. The number of possible regressions 
and correlations is thus very large, but they can all be expressed in terms of the variances 
of the variates and the correlations between pairs. 
We shall call the coefficients with Ё secondary subscripts regressions, correlations, ete., 
of the kth order. The correlation between a pair of variates pj, is thus of zero order, and 
- our result may be stated in the form that coefficients of any order are expressible in terms 
of those of zero order. The proof follows from the expressions which we proceed to derive, 
giving goefficients of any order in terms of those of lower order. We have 
XQ os.) = 20,25... 51) 91.23...) 


=2 © .93...(p—1) (Xi — Bip.23...@-1%p — terms in 2, to 20—1)) 


= Zl(zi.23...(p-1)) P Bip.23.. 0-92 (81.23... -1) Xp .23...(p—1)) 3 
hence, dividing by N, 


var (255,5) = var (21.23...) — Вур.эз...(р-1) Ёр1.эз...(р-1) VAT (2.23...(p-1)) 


= var (21.55...(5.)) (1 — Pip.23...(p—1)) + (15.10) 
which may be regarded as a generalisation of (14.11). Ву continuing the process we have f 
var (1,93...) = var (а)(1 — pts)(1 — рї, o)(1 — plos) . . « 

y (L = pioes.qp-1) + - (15.17) 

01,297.09. — 3 
22 CEPS (1 — pio) — pise) o (1 — pip.23...(p-1)) . ‚ (15.18) | 
The subscripts of the p’s can be eliminated in a different order, giving alternative forms 
such as 

01.23... 


ol = ü Жы р\з)(1 — pia) E pi».21) . « eto, 


Thus the variance of a residual of order p — 1 is expressible in variances of zero order and 
correlations of order p — 9. 


\ 


PARTIAL AND MULTIPLE CORRELATION зт 


15.7. Equations (15.4) may be written 


2 = 
P12 0192: P12.51...p Oar Віз.о...р Юез'Оз0з йкы 
Різ 0103 — Ё1з.з1:..р P23 0203 — В1з.э...р 93 — = 
ete. Adding the expression for (02 оз...р)› іе. 
2 А ж 
бү — 0 1.93...р — Ё1з.34...р P12 010° — Ё13.2...р P13 0103 —... = 0 
we have р equations from which, on elimination of the B's, there results 
ә 9 с 
| 01 — 0].23...p Piz 0103 Різ 0105 . . . 
P21 0301 оз P23 F203 . . . 
2 
P31 0301 P32 F302 65 Б à Е 0 


Where, of course, Pix = pu. Dividing the ith row by c; and the kth column by cp, we get 


2 
1] C138: 


= Pie pia . Pip 
сї | 
Різ 1 P23 . P2p | EU eee ТОБО) 
P2p P: : . l i 
Write Pip р 3p 
| 1 Різ Різ . Pip 
| 1 А 2 
oR Pas ^s Ran e e a20) 
Pip P2p P3p * : : 1 | 
and o, for the minor of the first row and column of this determinant. Then from (15.19)— 
Prat: 1.23.0 у “у 
91 
2 оўо у 
. diii co EN 


= О РУИС) 


Generally it may be shown in exactly the same way that 


COW (Eis леа copy CLR UR CT, PEERS р) 
— 9160 
Olm 
Where Om is the minor of the lth row and mth column in (15.20). 
This result shows that the variances and covariances of residuals of any order can 
*pressed in terms of the correlations and variances of zero order, 


M NS ees ые. С, 5705129) 


be е 


15.8. We have, as in (15.16), 


Da: Ў A У 
1:84: sp #з.за...р) = 21.4... —1) %2.34...@-1)) — D25.31...(p-1) 2834... 3 SL). 
Bubstituting i 


62 5; "o 
Dap.34...(p—1) = Pp2.34...(p—1) — 25:09 —D 


2 
and K d о .84...(0—1) 
(fo Шш uos for the covariances in terms of variances and regressions 
9 group of secondary suffixes 34... (p — 1), we find 4 


and writing 


: n 2 
Dio.qp 92.ар = Pig. ба — Bing By2.9 030 


372 PARTIAL AND MULTIPLE CORRELATION 
whence, in virtue of (15.15), 4 


ff 
Biz.q — Bipa Bp2.a А v х 
ce 


2p.q Pp2.q 


Віз. = . . . (15.23) 


expressing the partial regression coeffi 


cient in terms of those of next lower order. 
Writing down the similar equati 


on for fı. and taking square roots, we find 


P12.4 — Pin.q P2p.9 = J 
{0 — pina = pod Y 


a fundamental equation giving the correlation coefficient in terms of those of lower orders. 


P12.gp = . (15.24) 


15.9. From the above results it 
correlations and variances or covariances 


iances and regressions, of zero order, It is inter- 
eometrical point of view. 
8 of observations of p variates 
vir р; Фу e Sem von 

Consider a Euclidean (flat) space of N dimensions. 
there will correspond. one point in this space, and th 
observations will be p in number. 
is not that of N points in a p-way s he sampling 
discussions in Chapter 10.) Call these points Q,, Qs, . Qp- We will assume that the 
x's are measured about their mean, and take the origin to be P. 

The quantity No? may then be interpreted as the square of the length of the vector 


joining the point Q(— x, . , , %y;,) to P. Similarly Pim May be interpreted as the cosine 
of the angle Q, P Qm for 


n Жуу + e Фур. 


Pia == jm^ , 
i D PM 
which is the formula for the cosine 


Our result may then be expressed by saying that all th 
p points in the N-space are expressible in terms of the lengths 


15.10. The reader who prefers the 
ct will have no difficulty in transla 
terminology. We will here indicate onl 
sampling investigations. 

Note in the first place that the p points Q and the 
in degenerate cases) a space of p dimensions in the N-space. Consider the point Qi, , 
whose co-ordinates are the N residuals *1,2,,,». In virtue of (15.9) the vector PG: oe 


_ is orthogonal to each of the vectors HORT oe PQ, and hence to the Space of (p — 1) dimen- 
sions defined by P, Quom Q, 


Consider now the residual vectors Q, ,, ©» › where 7 represents the secondary suffixes 
34... (р — 1). The cosine of the angle between them, say 0, is рз. and each is ortho- 
gonal to the space P, Qa... Qo). Now take i E 


М on PQ, such that MQ, „апа MQ», 
are perpendicular to Р Qp. Then M Q;., is perpendicular tothe space P, an Sic nd 


geometrical way of looking at this branch of the 
ting the foregoing equations into trigonometrical 
y the more important results required for later 


point P determine (except perhaps 


Q, and 


чё 


EN o" 


| . PARTIAL AND MULTIPLE CORRELATION 273 


so is MQ, „. The cosine of the angle between them, say 4, is рә оь (cf. Fig. 15.1). Thus, 
to express ру» „, in terms of Piz.q We have to express ¢ in terms of 0, or the angle between 


Qq 


P M Qo 
Y. Fie. 15.1. 
КА 
| the vectors PQ., and PQ, „ in terms of that between their projections on the hyperplane 


| Perpendicular to PQ,. We have 


(i.a 2.4)? = Pia zi РФ, — 2Р0, „ РО» „соз@ 
шер ат е t T н 
pad PQ), = PM? + МО, 
| and hence we find 
MQ; .q UQ2,, cos ф = — PM? + РФ, „ PQ., cos 0 
МӨ. MQ. PM PM 
Pia PQ24 РО РОБ ^ $ . . (15.25) 


МӨ}... : - f 
ner PQ, is the sine of the angle M PQ, ,,, the cosine of which angle is Pip. Substituting 
In (15.25) we find 


D cos ф = 


cos ¢ = cos Ü 


cos 0 — P1p.a P2p.q 
2 2 е . e ^ 5 b 
КО = Pinal = Ра)" (15.26) 
push ie equation (15.24) in a slightly different form. The expression of a correlation 
Coefficient in terms of those of the next lowest order is thus capable of interpretation as 
the projection of an angle on to a space of one fewer dimensions. 


! Example 15.1 
In an investigation into the relationship between weather and c 
i з rops, Hooker (1907 
| үа the following means, standard deviations and correlations between the sA t d 
е8 hav (v,) in ewts. per acre, the spring rainfall (z,) in inches and the aceumulated 
*mperature above 42°F. in the spring (vs) for an English area over 20 years ;— 


| d, = 28-02 о, = 4:42 Pis = -+ 0-80 
A $,— 491 б = 1-10 pij = — 0:40 
Р Ëa = 594 оз = 85 pss = — 0:56 


1 


374 PARTIAL AND MULTIPLE CORRELATION 


The question of primary interest here is the influence of weather on crop yields, and 
we consider only the regression of x, on the other two variates. From the correlations 
of zero order it appears that yield and rainfall are positively correlated but that yield and 
accumulated spring temperature are negatively correlated. The question is, what inter- 
pretation is to be placed on this latter result? Does high temperature adversely affect 
yields or may the negative correlation be due tò the fact that high temperature involves 
less rain, so that the beneficial effect of warmth is mpre than offset by the harmful effect 
of drought? 


To decide this question, let us calculate the partial correlations and regressions. From 
(15.24) we have 
piss = Piz = P13 P23 : 
VA — pis)(1 — pis) 
|o Озо (— 0-40)(— 0-56) 
vV {1 — (0-40)?} (1 — (0-56)2) 


— 0/159. d | 
Similarly P133 = 0:097 ( 
Pitt —0-436. | 


We next require the regressions f and the variances of residuals. From (15.14) we have 
Role COV (2,5 223) 
Am c mE UR COE Ld 
VAT 25 
ex pisssOTS 
Оз 


This, however, involves the calculation of c; з and c; з which are not in themselves of interest. 
We can obviate the process by noting that from (15.16) 


9,53 = 01.3(1 — Pisa) 
0213 = 023(1 — p123)? 


so that 
— Piss 0123. 
Bios Cis 
The standard deviations оз and б»; are of some interest and may be calculated from К 
(15.18). We have à 5 
9,95 = 01(1 — pis) (1 — різ) y 


= 01(1 — pis)(1 — Pisa) 
the two forms offering à check on each other. 
From the first we have 
9,33 = 442(1 — (0-8)2}#{1 — (0-097)2}4 
= 2-64. 


Similarly 6243 = 0:594 
0312 = 70:1. 
(0-759)(2-64. 
"Thus b123 = D) = 3:37, 


and we also find 
Bigg = 000364. 


| 
f 


e 


PARTIAL AND MULTIPLE CORRELATION 37 


The regression equation of X, on X, and X, is then 
X, — 28:02 = 3-37(X, — 4:91) + 0-00364(X, — 594). 

, This equation shows that for increasing rainfall the yield increases and that for increas- 
ing temperature the yield also increases, other things being equal. It enables us to isolate 
the effects of rainfall from those of temperature and study each separately. The positive 
regression fs, means that there is a positive relation between yield and temperature when 
the effect of rainfall is eliminated. The partial correlations tell the same story. Although 
Різ is negative, ру; ә is positive (though small), indicating that the negative value of pi; is 
due to complications introduced by the rainfall factor. 

The foregoing procedure avoids the use of determinantal arithmetie, but the reader 
who prefers to do so may use equations (15.21). For example: 


H 0-80 — 0:40 
mie 0-80 1 — 0:56 | 
— 0:40 — 0:56 We | 
= 02448 
inne | 1 — 0:56 
| = 0:56 x 
= 0:6864, 
from which 
ias = Ox ET = 2-64 as before. 
11 


4 15.11 . When the work involves more than three variables it is desirable to systematise 
e arithmetic, Considerable assistance may be derived from tables of quantities such as 


1 
1 — р? /(1 — p?), z za 
: VL — р)(1 — pis) 
md (1916, 1938) and Miner (1922) have given tables for this purpose. Trigonometrical 
es are also useful in some cases. For instance, given p we can find 0 — соз їр and 


hence sin 0 (= 4/(1 — р?)), соѕес 0 (= 5) etc. 
— p? 


du c work some systematic method of reduction such as the Doolittle method 
з useful, 


Example 15.2 


: In some investigations into the variation of crime amon cities in the g 
El found a correlation of — 0-14 between crime rate (X. as measured Тө, DE 
Be ae offences per thousand inhabitants and church membership (X;) as measured 
S e number of church members of 13 years of age or over per 100 of total population 
na СИ of age or over. Тһе obvious inference is that religious belief acts as a deterrent 

me. Let us consider this more closely. 
If X, = percentage of male inhabitants, | 
X, = percentage of total inhabitants who are foreign-born males, and 
X,— number of children under 5 years old per 1000 married women between 
15 and 44 years old, 


376 PARTIAL AND MULTIPLE CORRELATION 
Ogburn finds— 


руз = + 0-44 Рза == —"0-19 
Різ = — 0:34 Pos = — 0:35 
pii = — 0:31 Рза = + 0-44 
Різ = — 0-14 Pas = + 0-33 


Рэз = + 0:25 


From this and other data given in his paper it may be'shown that we have, for the regression 
of X, on the other four variates, 


Pas = — 0-85. 


Ay = 19:9 = 4-51(X, — 49:2) — 0-88(X, — 30-2) 
and for certain partial correlations 


piss = — 0-03 
pisa = + 0:25 
fissi = + 0-23. 

Now we note from the regression equation that when the 
X, and X; are positively related, i 
with crime. How does this effe 
in the coefficient of zero order pis? 

We note in the first place that the correlation 
when the effect of 2, the percentage of foreigners, is 
when z,, the number of young children, is excluded 
both v, and x, are excluded is again positive. 
that a high percentage of foreigners 
to crime. Now both these factors 
(foreign immigrants being mainly Catholic and more fecund). 
merge the positive influence on crim 
population. The apparently negative effect of church membersh 
the more law-abiding 
zealous churchmen. 


The reader may care to refer to Ogburn’s p 


— 0072(X, — 4814) + 0-63(X, — 41:6), 


between crime and church membership 
excluded, is near zero, The correlation 
, is positive ; and the correlation when 
ression equation 
act as deterrents 
irch membership 
These correlations sub- 


aper for a more complete discussion. 
The Multivariate Normal Distribution 


15.12. We now turn to consider the generalisation 
normal distributions to the case of p variables, 
Consider the multivariate distribution 


p Í " 
dF — y, exp {= 1D (s. 2 ms PER NS M (15.27) 
r 8, 1 " 


ns 95 


of the univariate and bivariate 


y reduces, when p — 1 or 2, to the normal type. We shall 
bution, and proceed to consider how 


е variates. It is, of course, assumed 
as to ensure the convergence of the distribution function. For this 
it is Ed ds ын that the quadratic form PM * F shall be positive-definite, 
і.е. that there is а real linear transformation reducing it to the sum of squares of (or, 
in degenerate cases, fewer) new variates. Р 


THE MULTIVARIATE NORMAL DISTRIBUTION 377 


Make the transformation 
X = 
fe > үз жы о о 


and choose the Г so that the exponent of (15.27) becomes — 225°. Then we have 


Za. > ©» Їз lor Sj Er = ZE? 

0, 95 
and hence, writing (о) for the matrix of the quantities а, (l) for that of the Ёз and (i) for 
the transpose of (l), we have bamen . | ; | neo 
Further, the Jacobian of the transformation is | 1 |, the determinant of the Ёз, and hence 
the integral of dF is given by 


© о р 
vu - 3l exp (— 32 £3) dé, .. . d£, = (xy |1. 


Hence, since from (15.29) |«| | 71]? = 1, we have 


1 + Е 
МОНО ele РАО сн 


p 
эл) |1| (22) 


Let us now find the characteristic function of the distribution. We have to integrate 
Over the range of x’s the exponential of 


PC z) 22 (e )] 


HEAGA) ES 2X(it, с, 1 rj £j] 
= — 2, — Zi, lys)? + 2106 ji ene 1). 


The first part reduces on T€ toa constr The second gives the exponential 
of a series of terms of second degree in t, Mk coefficient of tjt, ajo; being 


2 i2 dr 1). 

Now 271, lẹ is the minor of the jth row and kth column in the matrix (/)(2) and hence, from 
(15.29), in the matrix (x)! = (A) say. Hence we may write 

lie e s tp) = exp {— (А оо) . . . (5.31) 
But when this is expanded the term in ojo; tjt is — pj; by definition and hence (A) is the 

matrix (w) of equation (15.20). Thus 

$(h, . . . th) = exp (— 32 (pin озор tty) )- E š . (15.32) 
: Furthermore, 
(x) = (A4)! = (o)! 


and hence the distribution itself may be written in the form 


jones epf- E z(o» = Nea ТТТ 
(2л) œ e Gy 98/0, Ci Tp 
For example, with the bivariate form 
lp 
lol =| 


378 PARTIAL AND MULTIPLE CORRELATION 


and hence o = 1 — p?, ву, O:; = 1, wr = wa, = — p, so that the distribution becomes 
the familiar form 


"A 1 1 Ti pra, | x3\\ da, dor, 
E RETE 


oi 0105 бой] G1 оз 


15.13. For any fixed x, . 


+ + % the exponent of (15.33) reduces to the normal 
univariate form in z, with mean è 


oi 


Ta EH a xh. 
Os Er c Oe БШШ A Y . (15.34) 
O1 о» Gs Op 


Thus the regression of x, on the other variates is exactly linear. The variance of a, in any 


array is PA and the distribution is thus homoscedastic. It follows generally that the 
11 


regression of any variate on any or all of the others is linear. 
(15.21) we see that the distribution may be written 


dF = 


Comparing (15.33) with 


exp is pa Pra. 12...y Brits lo 2а: 


Xp, 
€,.12...p 05.12...p. 


(15.35) 
р 
Roote a. ср wot 
where the secondary suffixes in the p and o’s do not, of course, contain r and s. 
Since every v is normally distributed, every linear function of ж is 50, as may be seen 
at once from (15.33). In particular the residuals are normally distributed, 
If in (15.33) we make the substitution 


= 
fo = V3 
6 = 23.01 


Či = % 501 etc. 
the exponent will be a quadratic function of the é’s. 
Č; r» jk, must vanish, for the covariance of £j and C, van 
end of 15.4. Tt follows that the distribution functio 


In this function all product terms 
ishes in virtue of the remark at the 
n may be written in the form 

1 2$ | a2 b 

des 5 ехр {- Е + 222 4 25a +... ) pe dase . (15.36) 


2 2 
2 of o3 
(290105059 ... 1 Bel 3.21 


two residuals ®у and aj, is of 
Consider, for example, а; and 23,21: 


i aS Pik „ represents the average 
j.a and x, о, being based on the sum СА Ж). In the 


-populations corresponding to particular assigned 


Sampling Distributions of Partial Correlation and Regression Coefficients 


15.14. We now consider the sampling distributions of the coefficients of partial 
correlation and regression. For large sam 


ples the values of Chapter 14 appropriate t9 


Bs Y. 


DISTRIBUTION OF PARTIAL COEFFICIENTS 379 


correlations and regressions of zero order may be used (subject to the proviso as to the 
unreliability of the standard error for p unless the sample is very large). For example, the 
variance of ру. in the normal сазе is given by 


1 gd. Le 
var (rgo) = Si SP ya) . 5 Е . (15.37) 


where » is the sample number; and that of the regression coefficient by 


Toe. ^. 
var (у) == ы pU PUT ү, 
a 


The proof of these results by the direct methods of Chapter 9 is a very tedious piece of 
algebra, They follow simply, however, from the remark of the previous section that the 
Correlation between any two deviations жу and zy ; is of the normal type with coefficient 
Pik.g 5 for it follows that ру. is distributed as the correlation between two normal variates. 
Similar considerations apply to the regression coefficients. It will be shown presently 
that if the original distribution was based on n observations, that of ру. „ is of the form 
of the correlation pj; based on љ — s observations, where s is the number of secondary 
subscripts in g; but as our equations are only true to order n^? the divisor in (15.37) 
and (15.38) may remain at » without further error. 


15.15. Consider now the geometrical representation of 15.9. Suppose we have 
three points Q, R, S in the n-fold space, represented by 2, . . . 2,91... Yn Zr- e» Zn 
respectively, the origin being P and the variables measured from their mean. Then the 
Coefficient of correlation between a and y is the cosine of the angle QPR, that between 
у and z the cosine of RPS and that between z and х the cosine of SPR. Now imagine а 
sphere described with unit radius and centre P, cutting PQ, PR and PS in Q’, В’, S'. Then 
will the partial correlation 7,y,, be the cosine of the angle of the spherical triangle Q’S’R’, 
and so for the other two partial correlations. This was, in effect, proved in 15.10, for the 
angle Q'S'R' is the angle between the projections of PQ and PR upon the space perpen- 
dicular to pg. 

Now we may make an orthogonal transformation, corresponding to a rotation of the 
Co-ordinate axes, without affecting the correlations; moreover, if the n values of one " 
Variate ж are independent and normally distributed so will be the n values of the trans- 
formed variates. Let us then make such a transformation and take PS as one of the new 
Co-ordinate axes. It is then apparent that the distribution of Tey.» Which is the cosine 
of an angle in the space perpendicular to PS, is the same in form as that of ту, except that, 

emg in (n — 1) dimensions, it is based on (n — 1) independent pairs of normally distributed 
Variates instead of n. 

Hence for samples from a normal population the distribution of the partial correlation 
Coefficient of the first order from sets of observations is the same as that of a correlation 
9f zero order from (n — 1) sets of observations. Ву a repetition of the same argument 
it follows that the distribution of a correlation coefficient of the sth order is that of the 
correlation of zero order from (n — s) sets of observations. The results of the previous 
aft had are thus immediately applicable to partial correlations. If, of course, s is small 
io pared with т, the distribution of partials is sensibly the same as that of ordinary correla- 

ns, which confirms the approximation of the previous section, 


380 PARTIAL AND MULTIPLE CORRELATION 
The Multiple Correlation Coefficient 


15.16. As in 14.22, the multivariate regression equation can be used to estimate 
the values of one variate from given values of the others ; but in order to see how good 
such as estimate is likely to be we require to know whether the values “ predicted " by 


tho regression equation are in close relationship to the observed values. Consider the 
regression of X, on the other variates : 


X, = P2.31.. X2 + Вз, а» Ха +... 
| = 155-0 вау. А : : К " . (15.39) 
If we substitute an observed set of values x, . . 


- ® we shall get a quantity е, 55...» 
say, differing from the observed x, by the residual 


quantity %1.93...» SO that 
E Ex — 11.25... 77.61.23... p. 

We may then judge of the accuracy of the representation of + 

gression equation by correlating z, and 1.93...» We have 


2061.3...) = Ze 


: 3 . (15.40) 
he observed z,'s by the re- 


ze e . (15.41) 
and (е оз...) = (жї) — F3 as, is) 


Nc жоу ы. с (542) 
л» SAY Rye, is given by 


а — __ COV (20161.93...) 
1(2...p) = Li 
(var x vare; о) 
(0? — 01.93...р)ї 
õi = 
giving 
2 01.23... 
RS ES. in p х x " + (15.43) 
A 


Rie...p is called the Multiple Correlation Coefficient between ж, 
have, similarly, multiple correlation coe 
e.g. Rios; Rios, ete. 

Two alternative forms of R are worth noticing. 


fficients of any variate on some or all of the others, 


From (15.43) and (15.21) we have 
Rie. —1—-9. 


=, so a oa OE 
11 


and from (15.43) and (15.17) 
1 – В2 


Mie Dae = А рр (1 р.а...) . (15.45) 


at since no p is greater than unity, R 
must be at least as great as t. my p entering into (15.45). R itself is 
essentially positive, for 01 > 61.03.» (equation (15.18) ). 


's must be zer ; . In this 
case 11 15 completely uncorrelated with any of the oth елеуш 


On the other hand, if R = 
value given by the regression equati 


д : 2 E 

i ^ 1.0. 2 is a linear functi f the othe 
variates. R thus provides a measure of the relations [ило 5 
variates. 


THE MULTIPLE CORRELATION COEFFICIENT 381 


15.18. The coefficient R has an interesting geometrical interpretation. It was noted 
sions defined by P, 0... Qp- Consequently the angle between this vector and the 
p-dimensional space P,Q, ... Q, is the complement of the angle Q, P Qi.2...p that 
is to say its sine is Ryo...» From this standpoint we see that if R — 0, PQ, is also 
orthogonal to the space Р, Qs, . . . Qp, ie. that a, is uncorrelated with 23... 5р. If 
R = 1, PQ, lies in the space and 2, is linearly dependent on z, . . . 2. 

15.19. The coefficient R, as mentioned in 14.24, is analogous to the correlation 
ratio y, and in fact from some points of view the two are formally identieal. Given a set 
of variate-values we may consider the variance of z, as composed of the sum of two variances, 
for we have, by definition, 

var v, = o] = of — O79...» + бэ... 
= var (61,5...) + var (жу — её...) . + (15.46) 
Thus the variance of x may be regarded as the sum of the variances (1) of the deviations 
of ж, from the values given by the regression equation, and (2) of those values themselves. 
We may write (15.46) as 
| var (жу) = of Rig...» + 011 — Rip.) Я . (15.47) 
Now consider again equation (14.75) in the form 
var z = var v {ny + 1— ny} 
= on, + AU — mn. В 5 А . (15.48) 
The relation with (15.47) is evident. It is redeemed from triviality by the fact that, just 
as the two parts on the right-hand side of (15.48) are independent in samples from an un- 
correlated normal population, so are those in (15.47) in samples from a multivariate 
normal population for which the parent R is zero. For in that case v, is independent of 


the other variables and therefore deviations of x, from the regression values are independent 
of the deviations of those values about their mean. 


15.20. From this fact we can derive the sampling distribution of R (the sample 
value of the multiple correlation coefficient) when R (the population value) is zero and the 
Population is normal. In fact, as in 14.24, we see that еа is the quotient of two 
independent variables. "The numerator is distributed in the Type III form with N — p de- 
8rees of freedom, for it is a multiple of the variance of %1.93...» ; Var 2, will be distributed 
as the sum of the squares of N variates about their mean, i.e. with N — 1 degrees of freedom 
var x, 5 with N — 2 degrees of freedom, and so on, every additional subscript lowering 
күке of freedom by unity, asin 15.15. Further, the denominator is distributed 
m My Туре ПІ form with p — 1 degrees of freedom, for it is the difference of var жу, which 

— 1 degrees, and var 2.9, which has N — p degrees.* Thus the distribution 
of R? is formally the saine as (14.85) with R? instead of n?, i.e. is 


E 1 2)KN—p—2) (22)Kw—3) 
dr = xem R?) »—7 (p) d R?, 


2 2 


: . (15.49) 


жт: 
in the 1s not, of course, true in general that the difference of two Type ПІ variates is distributed 
X te ype III form. In the present case we can find an orthogonal transformation of the variables 
ew independent normal variables, of which one may be taken to be the residual Bis 
33... p. 


| 882 PARTIAL AND MULTIPLE CORRELATION 
This can be reduced to the z-form by writing 


R? N—p 
= lloc 
pei nol 2 Ж » x (15.50) 


Mi —p — 1, %,=N—p 
The mean value of R? is the positive quantity (p — 1)/(N — 1). 


15.21. We proceed to find the distribution of R in samples from a normal multi- 
variate population when R is not zero. Two preliminary remarks are necessary. 
In the first place, any multivariate normal population can, by a linear transformation, 


be transformed to new variates which are normally distributed and independent. One 
such transformation has been given in 15.13. 


Secondly, any linear transformation leaves the multiple correlation coefficient in- 
variant, that is to say, the coefficient between ZI andes. x, is the same as that between 


x, and the transformed variables Grabs Gy Referring to (15.43) we see that, apart 
from the constant бї, R 


19...) depends only on 01.»...5; and since the regressions are 
chosen so as to minimise this quantity, the same minimum is reached whether we use the 
variables 2, . . . хр or the linearly related variables БЕЛГТ hp Conversely, if the corre- 
lation between x, and £, is а maximum for all possible sets o 


f £s, then that correlation is 
the multiple correlation coefficient between z, and the £s, and x, is uhcorrelated with 
Ges кор 


From the geometrical standpoint of 15.10, let us take the sample vectors PQ, . . PQ, 


and in the space defined by these vectors choose another Bou PST PS, which are 
mutually orthogonal. These will correspond to the transformed variates $, and the angle 


between PQ, and the Space remains unaltered 
Let us now choose £, so that the correlation between a, and ёз is a maximum in the 
population. Then if PS, is the sample vector с 


orresponding to £,, PQ, will be orthogonal 
to all the other vectors PS, . . . PS, (since x, is then independent of es oo EN 
In any given sample value the correlation between x, and &, will not 


to R (though the correlation in the population is R), but to a quantity 
sample to sample and equal to cos Q,PS, Let PT be the vector repre. 


; Le. R is invariant. 


be equal, in general, 
f, say, varying from 
senting the sampling 


т 
; 1 T 
regression formula > by...» t} This will lie in the x- space (cf. Fig, 15.2). Then 
ji 


Q: 


Ета. 15.2. 


THE MULTIPLE CORRELATION COEFFICIENT 383 


PT makes an angle cos! R with Q,P. Let Q,K be perpendicular to PT and Q,L to 
PS. Then 


т = PL/PQ, = (PK/PQ)(PL/PK) = Е созу . 5 - (15:51) 
` where v is the angle KPL. 

Consider the joint distribution of R and r. With a similar notation to that of 7.7 we 

write P(x | y) for the frequency element of x when y is given. Then we have, as in 7.10, 


` P(Rr|R=0)=P(R|r,R~0)P(r|R+~0) . . . (15.52) 


Now from equation (15.45) the distribution of R, given r, is equivalent to that of a function 
of partial correlations of z, with £ . . . & which, in our present case, is independent of В. 
Thus P(R|r,R = 0) = P(R | r,R = 0). Writing now 


P(r | R = 0) = P(r | R = 0)17(R,r) "Р (15.53) 


Substituting in (15.52) and integrating out for r, we get 


P(R|Rz0)—P(R|R— of Pir | R, R = 0)П(К, T: 
To evaluate P(r | R, R = 0) we note that this is equivalent to P(y | R, R = 0) in virtue 
of (15.51) with R fixed. Now when R = 0, жү and &, are independent. If we imagine the 
Space £, . . . &, fixed, v, will vary at random with respect to £, in it, independently of the 
angle cos7! R between a, and £, Hence P(y| R, R = 0) = P(y| R = 0) and thus 


P(R| R40) =P(R|R= of Ply | R = 0)I(R, R cos y) . —. (15.54) 


Now P(R | R = 0) is given by (15.49) with n written for У. Further, since v, and é, are 
distributed in the bivariate normal form with parent correlation R, the distribution of r, 
from 14.14, may be written 


a pq. M c. 
ка ст r2)n-? dr 
Tn) 


i-o dz 
ТЫР ствата ае af 


-œ (cosh z — Rr)y-i 
and since the first factor is P(r | R = 0) the second factor on the right is the function 
ATR, r) of (15.53) 


. For P(y | R = 0) note that, for fixed PS, PT may var 
dimensions, and 


in?-3 
8^? w. Hence 


y over a space of (p — 1) > 
for fixed y cuts off an element on the unit hypersphere proportional to 


Гр) | 
Py | R = 0) AM 2H sin 2-3 y dy, 


384 PARTIAL AND MULTIPLE CORRELATION 
Finally, on substitution of the various factors in (15.54), we find for the distribution of R, 


(3) 
dF = E 


2r Ls Еа R? 
me eA а ) 2 (1 — R?) s а(Е?) 


T (ol sin ?-? y dz dy E . (15.55) 
oJ — (cosh z — RR cos y)-1 


This may be expressed as a hypergeometric function. Expanding the integrand in 
powers of cos y we have, since odd powers vanish on integration, 


e" + 2j—2X sin ?-? y cos 27 y aj 
BC *) re tem 


j= 


л ims omis ag 
and since | cos 7 y sin 2—3 dy = 2(P — 3 = *) 
10 2 
у dz 12 + 2) —1 
est fy cosh "-142/z в(® 2 ) 


the integral becomes 
n + 2) — 2 p—2 9) +1 la+2—1 
Qa D) ET) (HE) aam 


= 90) : А | 
ai 2 Haam n —19p-.1 l 


rr OmU SEES Ic s . (15.56) | 
2 2 


whence we find, from (15.55), after a little further reduction, 


ar = "Cr) а— RS 


"esr 


0—3 n-p- 
R?) (1 — R?) F ? IR? 


Ane m Pio вә). 


15.22. Writing a = 100—1), b = 1(n — р) we have 
p) 


Гау) 1 — P diet — gig 
SO RES и cian e 

= Тау Г) (I — RR (2-2 (1, — R»y-i у JG = b, a, R?R?ẹ)dR? . (15.59) | 

It may be shown that | 


a+b, a+b, a, Врз арз — . (15.58) 


"gy 23. b | | 

(82) = 1 А RITUL TED 1, R?. . — . (15.60) 

— 0 we have the known result | 
ру 9 і 

XO E EE RR E „ 


In particular, when R 


THE MULTIPLE CORRELATION COEFFICIENT 385 


For large n we have approximately 
a + (b — 3)R? + R° 


u(R?) = re gee 


v. Stet 0669) 


For the second moment 


_ bb +0 Вз) jue : 
Bs = ER qup ct END 


с ОЕ EE лава, в (15.63) 
ЕБ E ead Е 
or approximately “Rec = Rs 

(82) = Lx edis ч : е d . (15.64) 


т 
which, however, breaks down near R = 0. It would, in fact, appear that the distribution 
of R tends to normality when А = 0 but not when А = 0 (cf. Exercise 15.3). 
Example 15.3 

From Example 15.1 we have found о = 0.2448, c, = 0.6864, from which we have 


Р 0-2448 
Жаз = 1 — узб 
= 0:6433, 


indicating that the regression equation is a fairly close representation of the data, since 
» the correlation between observed 2,’s and those provided by the equation, is high, 
about 0.80. 
It is hardly necessary to test the significance of such a value, but we will do so to illus- 
trate the arithmetic involved. If x, were uncorrelated with the other variates we should 
ave R = 0, and on the assumption that the population is normal (a reasonable assumption 
for crop yields, sunshine and rainfall records) we may use equation (15.50). We have, 
Since P = 3, п = 20 
z = Low, 076433 17 
~ 2 E 93597 2 
= 1:36 
уу = 2, 9, = 17. 
From Appendix Table 5 the 1 per cent. significance point of z for », = 2, v, = 17 is 0-9051, 
So that the observed R is almost certainly significant, z being much greater than can be 
accounted for by sampling alone. 


NOTES AND REFERENCES 


The theory of partial correlation is mainly due to Yule (1907). The reader may refer 
to M, Ezekiel's book (1930) for a detailed discussion of the practieal side of correlation 
analysis. See also a paper on the theoretical side by Frisch (1929). 

" Fora knowledge of the sampling properties of the partial correlations we are indebted 
о Yule (1907), who pointed out the applicability of large sampling “© normal” formulae 
-Or Coefficients of zero order to the partial coefficients, and to R. A. Fisher (1924), who 


18 a s ponsible for the exact result for small samples from a normal population and the 
S.— vor, 1, ос 


386 PARTIAL AND MULTIPLE CORRELATION 


distribution of the multiple correlation coefficient (1928). Some approximate results for 


the latter had been obtained by Isserlis (1917) and P. Hall (1927). Wishart (1931, 1932) 
has studied the exact distribution of R and the formally equivalent 7. Both of Fisher’s 
papers are notable examples of the power of the geometrical method of deducing sampling 
distributions. 

In comparing formulae given by various writers it is as well to examine whether the 


total number of variates (our p) or the number of dependent variates (p — 1) is being 
used as a constant in the equations. 


Ezekiel, M. (1930), Methods of Correlation Analysis, Chapman and Hall, London; John 
Wiley and Sons, New York. 

Fisher, R. A. (1924), “ The distribution of the partial correlation coefficient," Metron, 
3; 829. 

—— (1928), “ The general sampling distribution of the multiple correlation coefficient," 
Proc. Roy. Soc., A, 121, 654. 

Frisch, R. (1929), * Correlation and Scatter in statistical variables," Nordic Statistical 
Journal, 1, 36. F 

Hall, P. (1927), “ Multiple and partial correlation coefficients in the case of an n-fold variate 
system,” Biometrika, 19, 100. 

Hooker, R. H. (1907), “ The correlation of the weather and the crops," Jour. Roy. Stat. 

З Soc., 65, 1. х 

Isserlis, D. (1914), “ On the partial correlation ratio, Part I, Theoretical," Biomet; 
10, 391, and “ Part II, Numerical," ibid. (1916), 11, 50. 

—— — (1917), “ The variation of the multiple correlation coefficient in samples drawn from 
an infinite population with normal distribution," Phil. Мад., 34, 205. 

Kelley, T. L. (1916), “ Tables to facilitate the calculation of partial coefficients of correlation 
and regression equations," Bulletin of the University of Texas, No. 197. 

—— (1938), The Kelley Statistical Tables, Macmillan. 

Miner, J. R. (1922), K Tables of V1 — 7? and 1 — r? for use in partial correlations, etc.," 
Johns Hopkins Press, Baltimore. 

Ogburn, W. Е. (1935), “ Factors in the variati i а itis ээ 

g LAUR NU Е iation of crime among cities," Jour. Amer. 
Wishart, J. (1931), ‘‘ The mean and second. 
coefficient in samples from a nor 

(1932), ** Note on the correlation ratio,” Biometrika, 23, 441, 

Yule, б. U. (1907), “ On the theory of correlation for any number of variables treated 
by a new system of notation," Proc. Roy. Soc., A, 79, 182. 


rika, 


-moment coefficient of the multiple correlation 
mal population,” Biometrika, 22, 353. 


EXERCISES 
15.1. Show that 


e E бим» + Pip t= Dy 43...(p—1) 
— Pip33...(p— 
and that (р-1) D1.23...(p—1) 
P12.34...(p—1) = 7 P12-M..p F Pip.23...(p=1) P2p.13...(p—1) 


223.58 
a Pip.23...p-1))! (1 — Põp.i3. 0-0) 


(Yule, 1907.) 


EXERCISES 387 


15.2. Show that for p variates there are E 
= 9 
and ч Е Je ) of order s. Show further that there are ( 


correlation coefficients of order zero · 


A e correlation coefficients 


altogether and (Б regression coefficients. 


15.3. Show that for given pı and pis: p, must lie in the range 
2 2 2 о 
рузә різ Æ (1 — pis — Pis + pis руз)“ 
and that if v, and a, x, and x, are uncorrelated no inference can be drawn from that fact 
as to the correlation between v, and ху. 


15.4. Show that if Piz be zero, р» з will not be zero unless at least one of p, Роз İS Zero. 


3 


15.5. If the correlations of zero order among a set of variables are all equal to p, 


show that every partial correlation of the sth order is Р, 
dis (1 + sp) 


_ 15.6. Show that the distribution of the multiple correlation coefficient R tends, 
in normal samples, for large n, to the form 


p-3 
Ap? x 
ap — E79 3o. exp (— Bt — 499) 


1 pep 1 EVE 
XLA = deg B2 
E E jam 
Where В = R*(n — р), B* = R*(n — р). 
In particular, where р = 4, 
rw B mri aan 4 2 
аР = asp САН (— 3(B — 8)*) — exp (— &(B + £):y] dB. 


Thus, when f = 0 the distribution of B does not tend to normality, but when f is not 
Zero and is thus large for finite R, B is distributed approximately normally about В with 
Variance unity, 


(Fisher, 1928.) 


15.7. Show that the distribution function of R in normal samples may be written, 
n — р is even, in the form 
— 1 27 К 
Hn—p-2) r(E— 3) (1 — В) 


2 


2-1 
Q — R$ gea reri r(? > 22 (1 — R2Ripoisep 


x EE Ae 52, = Ripe. 


(Fisher, 1928.) 


CHAPTER 16 
RANK CORRELATION 


16.1. In previous chapters we have considered the dependence of attributes, аз 
measured by coefficients of association, and that of variables as measured (in the normal 
case at least) by product-moment correlation. In this chapter we shall consider a type 
of relationship which, in a sense, occupies an intermediate position between the two, the 
correlation of ranks. 

Consider a set of individuals which can be arranged in order according to some quality, 
such as a set of men according to ability or a set of musical compositions according to the 
degree of preference with which they are regarded by some observer. An ordered arrange- 
ment of the objects will be called a ranking and the ordinal number of a given individual 
in the ranking is called his rank. Thus with a ranking of n individuals there will be one 
rank corresponding to each of the n ordinal numbers 1 to n. 


16.2. Ranking is less general than the classification of attributes in the sense that 
the division of a population into classes А and not-A, or A,, А, . . . Ay, does not require 
any ordering of those classes; the measures of contingency and association discussed in 
Chapter 13 are invariant under rearrangements of columns or rows in the tables. On tho 
other hand, individuals arranged in an ordinary frequency table have their interrelationships 
more closely defined than if they are merely ranked, so that ranking is in a sense more 
general than measurement according to a variate-scale. To put the point in a slightly 
different way, a ranking is invariant under any transformation which stretches the scale 
of measurement of the variate. 


16.3. In practice, ranked data usually arise in two ways :— 

(a) From material which could be measured on a variate-scale but which is not so 
measured for reasons of economy, lack of adequate instruments, and so forth. This class 
includes the case where the data are given as measurements but are then ranked on the 
basis of those measurements in order, for example, to reduce the arithmetical work in 
investigating correlations. 

(b) From material which is believed to be capable of measurement theoretically but 
cannot be measured in practice, e.g. human preferences for food or intelligence. Ranking 
methods are sometimes applied rather uncritically to material which the experimenter 
considers to be capable of ranking, whether it has been demonstrated to be so or not. We 
shall return to this point below. і 

It is always possible by suitable conventions to impose a scale of measurement and 
hence a variate-system on ranked material; but the process is sometimes rather artificial 
and we shall in the first instance consider ranked material as such, without теди to 
the possibility of there being any pre-existent or superimposed variate in the background. 


Spearman’s Coefficient of Rank Correlation 


16.4. Consider a set of n individuals ranked according to two variables in the orders 
My, Жа» р Fa, Mae. « Yos where the X's and the Y's are permutations of the 
388 - 


р, 


SPEARMAN’S COEFFICIENT OF RANK CORRELATION 389 


numbers 1 to x. Our problem is to discuss the relationship between the X's and the Y's. 
If the individuals are denoted by 4, .. . 4, we may write the rankings in the form 


Individual A, A, ... A, 
Ranking 1 =X, X0. 5X, 5 
Rankmp 2. E. YI TEE. 


. (16.1) 


We note first of all that the concordance between rankings is perfect if and only if 
X; = Y, for allj. It is natural to consider the differences X; — Y; (= @, say) as measuring 
the difference between the two rankings. They are zero if and only if the concordance 
is perfect and their magnitude to some extent reflects the divergence of the rankings from 
perfect concordance. We also note that 


Qu dyes M X= Dyan}. с МТ 3069) 
7=1 j=1 j=l 


for each of the sums of X and Y is the sum of the first n natural numbers. We might 
then take X | d | as a measure of discordance, and a coefficient based on this quantity was in 
fact proposed by Spearman (1906). It is however subject to several disadvantages, similar 
to those attaching to the mean deviation, and a more suitable measure is obtained by 


: з 
using X(d?). It is easy to see that the maximum value possible for X(d?) is — . For 


Z'(d?) is the greatest if the d’s are as different as possible, i.e. if one ranking is the reverse 
of the other, so that the d’s are (п — 1), (n — 3)... — (n — 3), — (n — 1), though not 
necessarily in that order. In this case 


Z(X,Y;) = 1(n) + 2(% — 1) + 3(n — 2) + . . . n{n — (n — 1)} 
—1(n--1-—1)-2(»--1—2) +... n {(n + 1) — (n)) 
=a) Aj- 
j=1 j-1 
_ n(n + 1)(n + 2) 
ee a ee a 
Thus Zd’) = Ж(Х°) + Z(Y2) — 23(XY) 


_ n(n 4-12» +1) _n(n + 1)(n +2) 
T 3 3 


Si 
M A 
We then define 
6X(d?) 
mb dc em > 2 . А . . . ` • (16.5) 


^3 the Spearman coefficient of rank correlation. If the concordance between rankings is 
Perfect 5012) = 0 and p=1. Ifthe discordance is perfect p = — 1. g 


In other cases p lies 
between these limits. j 


390 RANK CORRELATION 


It is worth noticing that p is the product-moment coefficient of correlation between 
X and Y when we regard the ranks as variate-values. For we then have 


AD) =) = SES T 2 . s c QE 


6 9 


1 ә к ° 
Apo Олуг у du T EN zb: 3) «(t an *) 
_ni—n 
SALAD 


ZXY) — n (uy? 


IG Us a qu 
n cov (X, Y) 


ll 


2 
= рх аяр) x 
n? — 1 a 
—-—i #209), 


so that the product-moment correlation coeffieient of X and Y is 


16.5. There is an element of artificiality in the Spearman coefficient as defined which 
we must remove. The ranks are ordinal numbers and cannot without justification be 
operated on by the laws of cardinal arithmetic. For instance, if A, is ranked 4th and 
8th by two observers, d, is (4 — 8); but what does 4th minus 8th mean, and what signifi- 
cance is to be attached to its square? It is not entirely trivial to note that the necessary 
transition from ordinals to cardinals may be made without invoking a у 


ariate-scale. When 
we rank a member as r we mean that in the set of n, (r — 1) members are ranked higher. 
This number (r — 1) is a cardinal and in our particular example 4th minus 8th may be 


regarded as meaning that the difference of the number of members ranked higher by the 
two observers was 4. 


Example 16.1 


Two judges in a beauty contest rank the 10 competitors in the following order : 
CRD 3 el д Tee 0/5 E 
ИС oe БЫ 9,48 9 
What is the rank correlation ? 
The differences between the ranks are 
pee XA SES 


-—pLom:8 
which sum to zero as they should. 


Thus 2(d?)=44+9+49 + 36 + etc. 
= 128 
6.128 
=) = а 2 
990 0.224. 


S 
P 


i 


= 


} 


AN ALTERNATIVE COEFFICIENT OF RANK CORRELATION 391 


This indicates some sort of concordance between the standards of the two judges, but not 
а very strong concordance. 


Example 16.2 


In the previous example there was no information about the “real” order of the 
competitors, and p merely served to measure the degree of agreement between judges. 
Consider, however, the following case, where an objective order is known: In a test for 
ability to distinguish shades of colour, ten discs were prepared ranging from light to dark 
red, and a subject was asked to arrange them in order. The true order, as determined by 


à colorimetric method, was 

1,79. «3; БС н Во О: 
The order produced by the subject was 

4 7 210, ‘Sy 10928. lee D EOS 
What sort of a judge is he ? 

The differences are 
— 3, 55,1, —6, 2, 0, —1, 7, 4 1 

and — Z(d:) = 142, р = 0-139. 


The coefficient is low and we conclude that the observer was a poor judge. 


An Alternative Coefficient 

16.6. A second coefficient of rank correlation which has certain advantages may 
be obtained as follows: Consider again the ranking of the previous example 

4 7? 2.10 83°C 8 TL ОРОМИИ IOS) 

Consider the order of the nine pairs of numbers obtained by taking the first number 4 
with each succeeding number. The first pair, 4, 7, is in the correct order (in the sequence 
д дй, Ne 10) and we therefore allot it the score + 1. The second pair, 4, 2, is in the wrong 
order and we therefore score — 1. The nine scores will be found to be 


Se 1 л =й җе а аа 9), 
Consider next the scores of the second number 7, with its eight succeeding numbers. ` They 
are 
—1+1—1—1+4+1-—1—1+1, totaling — 2, 
Proceeding thus with each number we find 9 scores as follows :— 
+3, —2, +5, 6, +3, 0, 1, 2, stall 


The total of these scores is + 5. 

Now the maximum score obtained if the numbers are all in the objective order 1, 9, 
+++ 10, is 45. We therefore define the rank correlation coefficient т as the ratio of the 
actual score to the maximum score, i.e., in the present case, 


as compared with p = 0-139 for the Spearman coefficient. 
Generally, if there are n individuals the maximum score, obtained if and only if they 


392 RANK CORRELATION 


(n — 1 
are in the order (1,2... n), is (n—14(n—2)-...312 9 —D 


4 


the actual score by S, we have then for the coefficient of rank correlation 


Э 28 6.9 
San ar А З а ы х . (16.9) 


Denoting 


16.7. The actual calculation of S may be shortened considerably. Looking again 


at the ranking (16.8) we see that the number 1 has two numbers on its right and seven 
on its left. We therefore score 2 — 7 = — 5 a, 


nd strike out the 1. In the remaining 
ranking, the number 2 has 6 numbers on its right and two on its left, and hence we score 


6 = 2 = + 4; we then strike out the 2 and proceed with the 3 as before. It will be found 
that the scores obtained are 


n EcL. +16; 30 -E3, 0; — A, 


The total of these scores is + 5, and is equal to S. The rule is quite general. Its validity 


is evident from the consideration that instead of taking each number with its succeeding 
numbers we consider pairs contributing to 5 in a different way. Taking the number 1 
first, and remembering that all other numbers are greater than 1, we see that any number 
on the left must contribute — 1, and any number on the right + 1, to S. When 1 is struck 
out the procedure remains valid for 2, and so on. ч 

Alternatively the following procedure may be adopted. Considering again (16.8), we 
see that the number 4 has on its right 6 greater numbers, the 7 has 3 greater numbers, 


and so on, the numbers being 

6, 3, 6, 0, 4, 2, 1, 2, 1, 
totalling 25. "There must therefore be 45 — 25 — 9 
numbers in the ranking which are less than those 
before. Generally, if the number obtained by c 


0 numbers lying to the right of successive 
numbers, and hence § = 25 — 20 = 5 as 


ounting greater numbers is Ё, 


әр 2 —1) 
8 k Comes 


4k 

d th Tia 

an us T mae 1 : : A : . (16.10) 
A check may be obtaine 


d by counting greater numbers lying to the left. If the total 
of such numbers із | 


8 9—1) _ 


3 2l 


zt 
"(n—1) ^ . . a . (16.11) 


16.8. The extension of the use of т to the case where no objective order is given 
requires a little further consideration. Suppose we have two rankings as follows :— 
Ho du di Ay A, A, А Ay Аң 
E Q. QU T PEN MET EE 8-7 = . (16.12) 
Q CS) CI 0) Dye 5 do GE d. 08 


AN ALTERNATIVE COEFFICIENT OF RANK CORRELATION 393 


т may be obtained by arranging one ranking in the natural order (1, 2 . . . n) thus: 


As А, A, A; As A, Аһ As 4, As 
P r2 i @ БО б miis ro ОС твит) 
Q' 4 up ler ао ee 


and then finding т between Р’ and Q’ as in the preceding section. We have however to 
show that if we arrange Q in the natural order, giving 


A, A, As As А, 4А; A, Ay А As 
Р” 8.35 1 g 6 2 ОР " . (16.14) 
Or 1 9i Sh cae p G aa] 8 9 10 


then т between Р” and Q” is the same as that between Р’ and Q’. That this must be so 
may be seen as follows :— 
In (16.13) the successive contributions to S are, as found by the method of 16.6, 


+3, —2, +5, —6, +3, 0, 1, +2, +1. 


Consider now the contributionsto S from (16.14) when theshort method of 16.7isused. They 
will be found to be exactly the same. Ifthe permutation Q’ begins with a, the contribution to 
Sy from pairs involving a, will be (n — ао) — (а, — 1). In Р” the ath number will be 1 and 
the contribution to Sp» will also be (n — a,) — (а, — 1). Tf the second number in Q’ is 
4; the contribution to Sov will be (n — а) — (a, — 1) + 1 according to whether a, is 
greater than a, or not. In P" the a,th number will be 2 and the contribution to Sp. is 
also (n — q,) — (а, — 1) + 1 according to whether 1 lies on the left or the right of 2 in 
P”, ie, whether ау is greater than a, or not; and so on. 

In practical calculations it is not necessary to carry out the rearrangements. Consider 
again (16.12). Тһе number 1 in Q has an 8 above it in P. In the ranking of the А° 8 
has two members to the right and seven to the left. Score therefore, — 5, and strike 
out As. The number 2 in Q has a 3 above itin P, and A, has six members to its right (ignoring 
As) and two to its left, score + 4; and so on, the scores being 


— 5, +4; +1, +6, — 3, 0, +3, 0, — 1 
totalling + 5 which is equal to S. 


16.9. Like p, т is + 1 only if the correspondence between two rankings is perfect 
and — 1 only if the rankings are inverted. In actual practice the values given by the 
two coefficients bear a nearly constant ratio (cf. 16.24) and one appears to be as good as 
the other so far as providing a measure of ranking concordance is concerned. p is, how- 
ever, easier to calculate and is probably the most convenient to use. Against this must be 
Set certain difficulties in its sampling distribution, which will be referred to below, and the 
fact that т can be generalised to the case of partial rank correlations. 


16.10. In considering the interpretation of any particular value of p or t the question 
naturally arises, are such values significant in the statistical sense, i.e. can they have arisen 
by chance from a population in which the qualities under consideration are independent ? 
And further, can we assign a standard error to the observed values ? The second question 
1$ not an easy one to answer, or even to understand unless ranks are related to variate- 
values, In the sampling of variates we are given a set of n values emanating from a popu- 
lation of values. In the ranking case we are given n ordinal numbers, but it is useless 


394 RANK CORRELATION 


to consider them as emanating from a population of (different) ordinal numbers. The 
point will be considered later when we introduce the concept of grades (16.25). 

The sampling problem, however, acquires a definite meaning if the two qualities under 
consideration are independent. In such a case the pairs of rankings of n members drawn 
at random are independent; and consequently in a large number of samples there will 
occur in equal amounts every ranking according to one quality associated with every 
ranking according to the other. We are thus led to consider the distributions of p and т 
in populations consisting of all possible associations of all possible rankings. Clearly no 
generality is lost if we fix one ranking as the order (1,2... n) and consider its correlations 
with the n! possible permutations of those numbers. Ifa given p or т cannot, to an accept- 
able degree of probability, have arisen from such a population, we are justified in concluding 
that the two qualities have some definite relationship in the population. 


Sampling Distribution of Spearman’s p in the Case of Independence 


16.11. Consider then the distribution of values of p in the population obtained by 
correlating the order (1, 2 . . . n) with every possible permutation of the » natural numbers. 
We shall, in fact, find it more convenient to consider the distribution of X(d?) which is 


simply related to p by equation (16.5). Certain elementary properties of the distribution 
are obtainable immediately. 


(a) Any value of X(d?) must be even; 
values of d, and thus of d?, is even. 


(b) The possible values of (4°) range from 0 to 4(n3 


— n) and hence there are 
i(n3 — n) + 1 of them. 


— т) is even, or 

thatto any value 

gative value of p, of the 

X»... X, the inverted 
n 


permutation is X,, X, 4, ... X, 2(d*) calculated from P is then Ju (X; — i)? and 
i=1 


, n 
that from p inverted is РСЕ — n +1 + 2). The sum of these two is 


i=l 

Z(X;?) + 2012) — 22(x4) + Z(X;?) + X(n +1 — i)? — 22 {Хп +1 — i)}. 
The first, second, fourth and fifth terms in this expression are equal to X(72), i.e. to in(n + 1) 
(2n +1). The sum of the third and sixth is 


— 2(n + 1) X(X) = — n(n + 1)2, 
Thus the sum of the two 2(d?) is 


gn(n + 1)(2n + 1) — n(n + 1) 
= i(n? — n). 
Thus we see from (16.5) 


that the sum of the corresponding p's is zero 
(d) It follows that a d ў 


П odd moments of the distribution of 2d?) about the mean vanish. 


16.12. Consider the deviations between the order 1, 2, , 
one deviation is known, then certain de 


instance, if the deviation d, between Xi 


hrs + +,” and an order X. If 
viations become impossible for other ranks. For 
and 1 is (n — 1), then X, — n, and it is impossible 


for X(d) = 0 and hence the number of odd L 


T —.2127 - OO 


| 


DISTRIBUTION OF SPEARMAN’S p 395 


for the deviation between X, and 2 to be (n — 2); or for the deviation between X, and 3 
to be (n — 3), and so on. Consider then the array: 


»—1»—2 n—3... 2 H 0 
n—-2 »—3 n—4... 1 —1 
n—3 n—4 n—5... 0 —1 —2 


- s n 5). = @ =a е 
1 ad —(n—4) —(n—3) —(n — 2) 
0 —1 —2... —(n—3) —(n—2) —(n —1) 


Tf d, has the value in the rth row and the kth column, then d; cannot have the value in the 
rth row and the lth column; and so on. 

In fact, any permissible set of deviations is given by taking n entries from the above 
table so that no row or column contributes more than one entry. 

Hence to get (0°) for any permissible set, write 


а? at at а... q(^-1* 

au a а © at oe 

= а^ аі a9 at a-* 
а®-1° a=? q(Q-3* q- a? 


and X(d2) is given by the index of a of one of the terms obtained from E by choosing n 
factors so that no row or column appears more than once and multiplying them together. 
Thus the distribution of X(d?) is given by the totality of n! terms which can be constructed 
in that way. E will be taken to be equal to the polynomial in a given by the sum of these 
terms—the so-called “ permanent.” 


16.13. Е bears an obvious analogy to the determinant, but it cannot be regarded 
as such and expanded accordingly. If it could, the distribution of (4?) would be obtained 
Without’ difficulty, for a determinant with the elements of E as given above may be shown 
to be equal to 

(1 * а?)"—1 (1 = ат (1 = aae T (1 AR а?@—1)у, 


E, in fact, lacks the fundamental property of the determinant in that it does not change 
Sign if two rows or columns are interchanged. 

Nevertheless certain of the rules of determinantal algebra remain true for E. The 
most valuable is that Æ may be expanded in terms of its minors of any order in the usual 
Way. Expansion of this type is, in fact, rather easier with Æ than with the determinant, 
for all terms of _E are essentially positive and there are no difficulties with signs. Such 
*Xpansions were used in obtaining the distributions given below. There are also certain 
‘devices which assist the expansion of E in virtue of its symmetry. Two which will be found 
Useful are as follows :— 

(а) Any minor of Z is symmetrical in powers of a, ie. is of the form 

Aak + A,at-? + At? +... +A! + Азат 4 Ayam, 


(b) The effect of shifting a minor bodily across Z is to multiply each term of its 
?xpansion by a constant power of а, 


TABLE 16.1 
Spearman’s p. Distribution of X(d?) for Values of n from 1 to 8. 
Values of n. 


Z(d?) 1 2 3 4 5 6 7 8 
0 1 1 1 1 1 1 l I 
2 . 1 2 3 4 5 6 1 
4 . = 0 1 3 6 10 15 
6 . 2 4 6 9 14 22 

| 8 t 1 2 zj 16 29 47 
1 
10 . 2 6 12 26 54 
12 B 2 4 14 35 70 
14 . 4 10 24 46 94 
16 1 6 20 55 129 
18 3 10 2 54 124 
20 1 6 23 74 178 
22 10 28 70 183 
24 * 6 24 84 237 
26 ^ 10 34 90 238 
28 о 4 20 78 276 
30 : 6 32 90 264 
32 ч ae a2 29) 77379 
34 6 29 106 349 
36 3 29 123 380 
38 4 42 184 400 
40 . . . Е 1 32 147 517 
42 : . 3 : . 20 98 394 
44 5 5 à : б 34 168 542 
46 ^ . . . . 24 180 492 
48 “ . E . В 28 175 640 
50 . б E ; - 23 144 557 
52 . > 2 * Е 21 168 666 
54 Е . 3 ‘ > 20 144 595 
56 Р 5 E > Е 24 184 776 
(median) 
58 " s ‘ . = 14 . 684 
60 . à : 12 * 786 
62 5 " x . . 16 . 718 
64 5 : : . * 9 3 922 
66 б à š : : 6 s 745 
68 . i é à А 5 А 917 
70 . s г : Б 1 E 781 
72 3 . . s А . 982 
14 . . . : . : 826 
76 i н 
16 . . . 3 š = а 950 
> : » Е 5 " И 844 
EA . . E . . Е a 1066 
Em . s . А E а 5 845 
. n . . ә " 936 
(median) 
TOTALS 1 2 6 24 120 720 5040* 40,320* 


* Total of whole distribution only the median value 
e ‚© апа 
on one side of the median being shown in this fabio dioe: 


А. 


TABLE 16.2 


Spearman’s p. Probability that (4?) will be Attained or Exceeded for Values of п from 4 to 8 


inclusive. 
(43). а 
0 2 4 6 8 10 |07 l4 | 16 199]. 2 | 22 |р 5 | "98 
zam | | 
pua 1 |0-958 | 0-833 | 0-792 | 0-625 | 0-542 | 0-458 | 0-375 | 0-208 | 0-167 | 0-042 
з = б 1 | 0-992 | 0:958 | 0-933 | 0-883 | 0-825 | 0-775 | 0-742 | 0-658 | 0-608 | 0-525 | 0-475 | 0-392 | 0-342 | 0-258 
mG 1 [0-999 | 0-992 | 0-983 | 0-971 | 0-949 | 0-932 | 0-912 | 0-879 | 0-851 | 0-822 | 0-790 | 0-751 | 0-718 | 0-671 
ЕЕЕ: 
n=7 1 | 1-000 | 0-999 | 0-997 | 0-994 | 0-988 | 0-983 | 0-976 | 0-967 | 0-956 | 0-945 | 0-931 | 0-917 | 0-900 | 0-882 
ав 1 [| 1-000 | 1-000 | 0-999 | 0-999 | 0-998 | 0-996 | 0-995 | 0-992 | 0-989 | 0-986 | 0-982 | 0-977 | 0-971 | 0-965 
| | 
30 32 34 36 | 38 | 40 | 42 | 44 | 46 | 48 | 50 | 52 | 54 | 56 | 58 
c ==. 
N= 5 |0225 | 0-175 | 0:117 | 0-067 | 0-042 | 0-0°83 
_ | 
n=6 [0:643 | 0-599 | 0-540 | 0-500 | 0-460 | 0-401 | 0-357 | 0:329 | 0-282 | 0-249 | 0-210 | 0-178 | 0-149 | 0-121 | 0-088 
—Q | 
? —7 10-867 | 0-849 | 0-823 | 0-802 | 0-778 | 0-751 | 0-722 | 0-703 | 0-669 | 0-643 | 0-609 | 0-580 | 0-547 | 0-518 | 0-482 
Duc a анань — 
^ —8 10-958 | 0-952 | 0-943 | 0-934 | 0-924 | 0-915 | 0-902 | 0-892 | 0-878 | 0-866 | 0-850 | 0-837 | 0-820 | 0-805 | 0-786 
г... C 
60 62 64 66 68 70 72 74 76 78 | 80 82 84 | 86 88 
E oe МЕНИ " 
^ — 6  |0-068 | 0-051 | 0-029 | 0-017 | 0-0°83 0-0214 
iss ND = 
n= 7 (0.453 | 0-420 | 0-391 | 0:357 | 0-331 | 0-297 | 0-278 | 0-249 | 0-222 | 0-198 | 0-177 | 0-151 | 0-133 | 0-118 | 0-100 
E c — 
^ —8 [0.769 | 0-750 | 0:732 | 0-709 | 0-690 | 0-608 | 0-648 | 0-624 | 0-003 | 0-580 | 0-559 | 0-533 | 0-512 | 0-488 | 0-467 
90 92 94 96 | 98 | 100 | 102 | 104 | 106 | 108 | 110 | 112 | 114 | 116 | 118 
cc || | 
= 7 |0083 | 0-069 | 0-055 | 0-044 | 0-033 | 0-024 | 0-017 | 0-012 | 0-0°62 0-034 0-014 0-020 
cos MN | | 
| : 
^ —8 | 0-441 | 0-420 | 0-397 | 0-376 | 0-352 | 0-332 | 0-310 | 0-291 | 0-268 | 0-250 | 0-231 | 0-214 | 0-195 | 0-180 | 0-163 
d | | | 
120 | 122 | 124 | 126 | 128 | 130 | 132 | 134 | 136 | 138 | 140 | 142 | 144 | 146 | 148 
SSS 
%=8 [0.150 | 0-134 | 0-122 | 0-108 | 0-098 | 0-085 | 0-076 | 0-066 | 0-057 | 0-048 | 0-042 | 0-035 | 0-029 | 0-023 | 0-018 
| | б 
150 152 154 | 156 | 158 160 162 164 166 168 
п = 8 0-014 | 0:011 0.0277 | ооз 0-0236 | 0-0223 | 0-011 | 0-0°57 | 0-0220 | 0-0125 


397 


398 RANK CORRELATION 


e.g. the minors 


| a? a! aq 
| 
М = аї а al =a? + 2a? 4 9q1 + as 
сац) 
f at a? as) 
3 : | 
апа ДИ а= 1 at as a? } = a!*(a9 + 2a? + 2a* + a) 
- 0 1 4 | 
{ @ а at j 
are related by 
M' = Ma”. 


16.14. The tables on pp. 396-7 show the frequencies of (1°) for values of n from 1 to 
8 inclusive and the probabilities that a given value of X(d?) will be attained or exceeded 
on random sampling for n from 4 to 8 inclusive. 


16.15. The distributions of Table 16.1 are 
values of x they are distinctly bimodal. Forn = 
an unusual serrated profile, that for the latter be 


peculiar in several respects. For lower 
7 and n = 8 the frequency polygons have 
ing shown in Fig. 16.1, though normality 


о 20 40 60 808490 по 130 150 168 


Values of z(a). 


Fic. 16.1. Spearman’s р. Frequency Polygon of Z(d*) for n = 8, 


їз beginning to emerge. It will be shown below 


that as п> co the distribution tends to 
normality, but it is not immediately obvious how л 


a serrated polygon of this kind can do во. 


DISTRIBUTION OF SPEARMAN’S p 399 


I think that the tails of the curve smooth out first, and that as n increases the caress 
runs up the curve towards the apex. 


16.16. The calculation of frequencies for n greater than 8 would be a tedious process 
and can be obviated by finding curves which satisfactorily approximate to the distribution, 
at least so far as its distribution function is concerned. For this purpose we will find the 
second and fourth moments of p about its mean. The first and third, of course, are zero. 

Suppose we measure the rank numbers from their mean, writing for the new variables 
«=X — }(n + 1), y = Y —}(n + 1). Then from 16.4 we have 


М, 125(шу) _ 


n—n N ү з у), вау, 
where N= E Since E(p) = u(p) = 0 we have 
varp = yw) 
= Bley y?))) + + Е (буул. . 0. —. (16.15) 


where { = j. Now for any value of т, y may have any value from 1 to ». Hence 
EX(x?y?) = nE(x?)E(y?) 


= Ж # We. WT tit EMT SEO) 


Further, in the product term of (16.15) there are n(n — 1) pairs of values i =j and thus 
EX («у ушу) = пт — 1) E(x; уруу) 
= n(n — 1) E(x)? 
1 


. = n(n — 1) (ngog)? 
1 a à 
: = n(n — 1) (2x)? — 2(2°))° 
= Ne 
а СЕ е, А LU 


Hence, substituting from (16.16) and (16.17) in (16.15) we have 


1 1 
Dui) an n(n — 1) 


1 
ps V * Э “ . + (16.18) 
By the same technique it may be shown that 
jus 3(25n? — 38n? — 35n + 72) 
Hal Sia Aju a аз ee RIBUS) 


400 . RANK CORRELATION 
16.17. Consider now the Type II symmetric distribution 


n4 
(1 —2?)2 da, -—-l<a<l З . (16.20) . 


The first and third moments are, of course, zero. The second and fourth are given by 


S — 5 21 
йз Bl, ^ —2 йс] . . . » (16 ) 
5—5 
Е n — =) 
[ J^ 3 
күл м ла а „= a s 922) 


The distribution thus has its first three moments the same as those of Spearman’s р in the 
case of independence. The fourth moments are the same to order n~”, the difference being 


3 fı . 25n? — 38n? — 35n + 2) — 36 


on 25n(n — 1)? 


25n? 


ie. of lower order in n than the moments themselves. It has therefore been suggested 
that the distribution (16.20) may be used instead of that of p to give the distribution function 
of the latter for moderate or large n. Tests on the distributions of Table 16.1 indicate 
that this is a justifiable approximation. 
For instance, when n = 8 the distribution (16.20) becomes 
ne 1 
80,3) 
and by direct integration the probability of obtaining a value of x greater than 2, in absolute 
value is 


(1 — a2)? dx 


15 E nera 
1— ple- +2). t эж бы 0162) 


In comparing this with the values of the p-distribution it is as well to make a continuity 
correction, similar to that of 12.15, to allow for the fact that the distribution of p is dis- 
continuous whereas that of v is continuous. If the values of X(d?) are regarded as spread 
over a range of one unit on each side of the actual value, the range of X(d? i 
4(n — n) to $(n? — n) + 2, each terminal contributing a un 
we will then write 5 


) is increased from 
Instead of writing x = p 


Z(d?) 

q a hN ; 

1003 —n) +T * ° г . (16.24) 
Now from Table 16.2 the probability of obtaining a value of p greater than + in absolute 

value, corresponding to X(d?) outside the range 14 to 154 inclusive, is 2 x 0-0053 = 0-0106. 


С а 14 
The appropriate v from (16.24) is 1 — 85 = 0835, and this on substitution in (16.23) gives 


the probability of 0:0098. Similarly the chance of getting a value of Ad? 
26 to 142 inclusive is 0-0576. That given by (16.23) is 0-0561. 


good enough for most practical purposes and would, of course, 


)outside the range 
The agreement is evidently 
improve as n increases. 


= 


4m 


m стай 


DISTRIBUTION OF SPEARMAN’S p 401 


16.18. If we put, in (16.20), 


we obtain the distribution 


? à ir ay fn . (16.25) 
(б — 3 19 1) ( E: y 


dF = 


n—2 


the “ Student ” distribution of Example 10.6. If is large the continuity correction may 
be neglected and to this approximation 
x =p, 


So that p may be tested in “ Student’s " distribution by writing 


(IL) s Uo» WE эшо 


Ezample 16.3 


In Example 16.2 we found a value of p = 0:139. 15 this significant ? 
We have n = 10 and from (16.26) 


8 
= 0139, / — (iss CSE 


= 0-397. 


From Appendix Table 3 we see that the chance of getting such a value or greater in 
absolute value is about 0:70 (= 2(1 — 0:65)). The value cannot therefore be regarded as 
Significant, 


16.19. As љ tends to infinity the B-distribution tends to the normal form and we 
therefore suspect that p also tends to normality. That this is in fact so may be seen as 
follows; the proof being due to Hotelling and Pabst (1936). 

The general moment of p of even order is given by 


" = бк By dea UE = а чу 06000) 


n n 


where 5, is written for PA 2? and generally Sp for 25 xP. When the parenthesis is expanded 


i=l i=1 
We may, in virtue of the independence of v and y, take expectations term by term, regarding 


t > 
he a's as constant. Now 


1 2 1 
Bly) = Eu?) = = Eye) = 185 


2 l 2a ay. 
Ey?" yj) = nai) Lye" у), ete. 


AS—vor, 1, DD 


i02: RANK CORRELATION 


Hence 
ПОЙ А r B 6.28) |, 
се = (бо Зауза) e Mein 22-2 m „үз 1. eto. (16. = 
(аз а m) aem xor) DOED prs s } 
where the coefficients A depend on « but not on n. We proceed to show that the term 
of greatest degree in n in (16.28) is the term X(z; dore ae 


p = 
The numerator of any term in (16.28), being a symmetric function of the cs, сап E. 
expressed in terms of the symmetric sums S,. Further S, vanishes if p is odd. | 
any S, is of degree k + 1 in n, the degree of a non-vanishing term Se, 5... Sa 


p 
p (4; +1) = 20 +p. Consequently the term of highest degree in n must contain as 


j=l 


high a p as possible, that is to say as many S's as possible, subject to the requirement that 
the subscript of each S must be even. 
Now consider a term 


E T XZ68,8,... 6, A 8... Sap 


37 Хе, Sartuta Өш, е. Sa, Imi. 4 800. + . (16.29) 
Tf the «’s are all even the term of highest degree on the tight is, as just remarked, 
2x +p. If the a’s are not all even, suppose there are m even ones and 2q odd ones 
(т + 2q =p). Then the first term in (16.29) vanishes and the term of highest degree 
which does not vanish must be obtained by grouping q pairs of odd «’s, and hence is 0 
degree 24 + m +q = 2х +p — q. 
Now in (16.28) the degree of the denominator in each term is the number of different 
z'sin the numerator. Thus the term of highest degree in 2 is of degree 


2(2a + p — q) — (m + 2q) = ж — m + 2p — 4q 
= 4« + m. 


This will be a maximum when m is a maximum and therefore when g is zero, in which case 


т = «. Hence the greatest degree in » in (16.28) arises from the term PACA тр?) 
as stated. Now in the expansion of 
(x, Yı a Et: 2, Yn)?* 
! 
the coefficient of xf . . . 0,2 y? . . . y,? is, by the multinomial theorem, a and hence 
aa T (24) (E(ri...2,7)3 . (16.80) 
9:382 т 


The term of highest degree in n in (a? . . . 2,3) is that in 8,7, the coefficient of which 1 
evidently the reciprocal of that of X(a?... v,?) in 


Ба E PCS 
к=! 


(2a)! f 1 1 б 
Ha ~ эх a! |n* +0 nett 


і.е. 
Thus, from (16.30), 


| 


y 


Hl .l-c-—————-X A 


* 


DISTRIBUTION OF z { Y 403 


Now u, = i and thus 
mi —1 


Hx y. (2а): 
Hot 2*2 


Ae vont recalls 016591) 


i.e. to the moments of the normal distribution of unit variance. It follows from the Second 
Limit Theorem of 4.24 that the distribution of p tends to normality. The tendency is not, 
however, very rapid and we have already noticed the peculiar character of the distribution 
for lower n. 


Distribution of t in the Case of Independence 

16.20. We now consider the distribution of the coefficient т under similar conditions, 
that is to say in a population obtained by correlating a given ranking with all the n! possible 
rankings, 

Consider a given ranking of the numbers 1, 2, . . . n and the effect of inserting an 
additional number (n + 1) in the various possible places in the ranking, from the first place 
(preceding the first number) to the last place (following the last number). А 

Inserting a number at the beginning will add — n to the value of S of equation (16.9). 
Inserting it between the first and second will add — (n — 2) to S; and so on. Thus to 
any frequency-distribution of S for given n, say f(S, n), there will correspond frequencies 
f(s — n,n), f(S — (n — 2), т) . . . f(S +n, n), the sum of which gives f(S, n + 1). If the 
frequency of a given S is the coefficient of xS in a polynomial P(x), then the corresponding 
values of S in the frequency for (> + 1) are the coefficients of 


(a HaT Sg? a) P(e). 


But ‘the frequency-distribution of S when n = 2 is given by a~! + 21, there being one 
value S = — 1 and one value S = 1. Thus the frequencies of S for rankings of n are the 
Coefficients of zS in the array 


F = (a51 фае 1 p we? fat teat +a’) ... (079-0 4 gn) 
аа ау АИ АЕ 16132) 


Tt follows that the distribution of S, and hence that of т, is symmetrical about zero. 


The values of § are either all odd or all even, according to whether SE is odd or even. 


4 


The actua] frequencies may be caleulated by a figurate triangle, as follows ;— 


Value of n | Frequencies of S 
1 i| 
2 Lac d: 
3 M пе) 
4 ES) 76 160 be ai at | 
5 1074 9" 15 20, 199. 560 15 ОМА 


A m array a number in the rth row is the sum of the number above it and the (r — 1) 
Eur ers to the left of that number. А little reflection will show that this rule follows 
arith (16.32), The formation of the array is quite simple and several devices shorten the 
h metic. For instance, in part of the array towards the left a number in the rth row is 
eee of the number immediately above it and the number immediately to the left. 
“tray is symmetrical and the total in the rth row is 7! 


404 RANK CORRELATION 


The following tables show the frequency-distribution of S for values of n from 1 to 10 
inclusive and the probability that a value of S will be attained or exceeded. 


TABLE 16.3 


Rank Coefficient v. Distribution of S for Values of n from 1 to 10 (only the Positive Half 
of the Symmetrical Distribution. shown). 


Values of n | Values of n 
S S | 
1 4 5 8 9 . 2 3 6 7 10 
| 
0 1 6 22 | 3,836 | 29,228 1 1 2 101 573 | 250,749 
2 5 20 | 3,736 | 28,675 3 1 90 531 243,694 
4 3 15 | 3,450 | 27,073 5 | 71 455 | 230,131 
6 1 9 | 3,0017 | 24,584 ae 49 359 | 211,089 
8 4 | 2,493 | 21,450 9 29 259 187,959 
10 1 | 1,940 | 17,957 11 14 |-169 | 162,337 
12 1,415 | 14,395 13 5 98 | 135,853 
14 961 | 11,021 15 1 49 | 110,010 
16 602 8,031 17 20 86,054 
18 343 | 5,545 19 6 64,889 
20 -| у 3,606 21 1 47,043 
22 76+ 2,191 23 32,683 
24 27 1,230 25 21,670 
26 7 628 27 13,640 
28 1 285 29 8,095 
30 111 31 4,489 
32 35 28 | 2,298 
34 8 35 1,068 
36 1 37 440 
39 155 
41 44 
43 9 
45 1 | 


16.21. As may be seen by comparing Tables 16.1 and 16.3, the distribution of S, 
and hence that of т, is much smoother than that of X(d?) and p. We show below that it 
tends to normality, and in fact the tendency is so rapid that for values of n greater than 
10 the normal distribution provides an adequate approximation. We proceed to find the 
second and fourth moment of the distribution. 

If we differentiate the expression f in (16.32) and equate x to 1 we evidently obtain 


the first moment of S; and generally, writing 0 for the operator "d 


Ox" 
fee Ое а о опазва 
For example, when r = 1 we have 
ap, =(—14+1)14+1+1). pia leases ate Al) 
+ (0+ 100—2 + 2)(.. ў 
+ ete. 
= 0; 


E 


* 
DISTRIBUTION OF c 405 


TABLE 16.4 


x Probability that S attains or exceeds a Specified Value. (Shown only for Positive Values. 
' Negative Values obtainable by Symmetry.) 


Values of n Values of n 
S S 
4 5 8 9 6 7 10 | 

0 0-625 0-592 0-548 0-540 1 0-500 -| 0-500 0-500 

2 0-375 0-408 0-452 0-460 3 0-360 0-386 0-431 

4 0-167 0-242 0-360 0-381 5 0-235 0-281 0-364 

6 0-042 0-117 0-274 0-306 7j 0-136 0-191 0-300 

8 0-042 0-199 0-238 9 0-068 0-119 0-242 

10 0-0283 | 0-138 0-179 11 0-028 0-068 0-190 

12 0-089 0-130 13 0-0283 0-035 0-146 

14 0-054 0-090 15 0-014 | 0-015 0-108 

16 0-031 0-060 17 0-0254 0-078 

18 0-016 0-038 19 0-0214 0-054 

А 20 | 00271 | 0-022 21 0-020 0-036 
Vs 22 0.0228 | 0-012 23 0.023 
. 24 0-0287 | 0-0263 25 0-0214 
26 0:0319 | 0.0229 97 0:083 
28 0-0425 | 0-012 29 0-0246 
50 0-0243 31 0.0223 
n 0-0212 33 0-0211 
77 0-0125 35 0-0347 
0-0528 37 0-0318 
39 0-0258 
c 0-0415 
43 0-0528 

45 0-0628 


When 7 = 2 the operation on f will result in two types of terms, those in which both 
айо operate оп one factor of f and those in which the operations operate on separate 
Actors, When x — 1 these last vanish and thus 


Y ! 2 2 24. 32)4! 
м = 0 m rim LL GUI D 9n 


(a= 1I*4-n—3* 4... 85 —3* E08 — 12 
% 


+. 


2 2 Zing " 2,—— — EL 
Ba pelt me sica icai s T & ce % does =й — 3*4 Ыф .) 
This may be summed by the ordinary methods of elementary algebra, and we find 
n(n — 1)(2n + 5 
T = — б XT £ сєз 
Ina like manner it appears that 


n= — 1) 
| 2 


| = + E — 2)(n — 3)(n — 4)(n — 5) А s . (16.36) 


E +n — 2) + Zen — Da — 8) + 3g" — 2) — Syn — 4) 


406 : RANK CORRELATION 


16.22. To prove that the distribution of т tends to normality as n —> co we shall 
show that 
Ly Qa) ya 
gai wl May. 
Consider the effect of operating on f in (16.32) by 0 2x times and then putting x = 1. There 
will appear terms like 


Ha > 


Җ‚ @-+2) +t 
т t j 
ete. Any term with an odd superscript vanishes. Consider now the sum of terms like 


2 2 L 42 2 2 
afte trier .+е., / TD . (16.37) 


It will be shown below that this term contributes the greatest power of » to the sum 
giving n! Mox 

In virtue of the multinomial form of Leibniz’ theorem on the differentiation of a product, 
the factor by which this term is multiplied. in the expansion of 0?*f is 


„=т= 1l(r—2923...-p(p—29xipgh— = 


(2x) __ (2x)! 
Searels Тох 
(2a)! е : 
Hence Ha, ~~s (Sum of terms like (16.37)) 3 а . (16.38) 
3 2 
Each of these terms is of буре (r* +(r—2)2? +... (r — 2)? + 7°} ie. is of order 5 


s UT Я 
The sum will then tend to the sum of terms like ga(l?. 2 ... 02), each term containing 
a squares of the numbers 1,2...” — 1. Call this z,. 


ls. 4 
Then л, is times the sum of terms in 


1 за 
si 52 Tel а= ПСЕ z 5 . (16.39) 
which contain « different factors. 


3a - 
Now (16.39) is of order = ~ шў. Hence if л, tends to equality with the sum (16.39) 


and hence, from (16.38) 
vw Co! (i 
2a prr esp 
We have then to show that (16. 38) tends asymptotically to the sum of its terms «! л,, 1.0: 
that sums of terms like = 
33.22... (x — 8, 19:28... (к — 2) 


tend in comparison to zero. This may be shown inductively. онаа fir 
0 422+. 20 = 1)2)2 = 2n; 4-14 АРУ ЧЫ у, чо = bi of all 


КҮҮ 


нана -.«ó 


N 


DISTRIBUTION OF т 407 


А n5 A 
_ The expression on the left = But the sum of fourth powers on the right ~ B which 


is of lower order. Hence the sum on the right ~ 272. We then have 
(124224... (n — 12 2m? +... — 17) 
~ бл; + terms of type 1*.2°. 
These terms will be less in sum than 
203-922. -... (n — 12) (15 2... (в — 5) Ы 
which ~ аЛ, of degree 8. But the expression on the left is of degree 9. Hence 


02422... (n — 1)2}5 ~ bms, and so on. 


We can now justify the assertion that the maximum power of n arises from terms like 
(12.92... 42). In fact, by a similar line of reasoning to that just given it will appear 
Баса Of вена like (14.92... (є — 1)?) are of lower degree in n. This completes 
the demonstration. 


16.23. In using the normal distribution to approximate to the S-distribution it is 
desirable to make a correction for continuity by subtracting unity (half the interval) from 
S in order to obtain the probability that a given value will be attained or exceeded. For 
Instance, when n = 9 we have from (16.35) 


9.8.23 
18 


var 8 = 92. 


The normal deviate corresponding to S = 20 is then = 1:981. The probability 


792 
x 
of а normal deviate as great as or greater than this is 0-0238. The value from Table 16.4 
is 0:022, Had we made no correction for continuity we should have found a normal deviate 
of 2-085 with a probability of 0-0185. 


Example 16.4 


In 16.6 we found for a certain ranking of 10, т = 0-111, S = 5. The Spearman 
Coefficient for the same ranking, 0-139, has already been seen to be non-significant. What 
conclusion should we reach about т on this point? 

" From Table 16.4 it is seen that the probability of a deviation greater than or equal to 
5 is 0-364, and that of a deviation greater than or equal to 5 in absolute value is then 
0-73 approximately. The corresponding value for p is 0-70. In either case the coefficient 


could well have arisen from an “independent ”’ population and is not significant. 


16.24. Different as p and т are in conception and method of calculation, they are very 


closely related. It may be shown that for the population in which all rankings occur 
equally frequently 


1 
соу (S, X(d?)) = — тв”? + 1)?(n — 1) 
fr " 
rom which the product-moment correlation between p and т is 
2(n + 1) il 


у {2n (2n + 5)} wy 4n ` 


Отв) 


408 RANK CORRELATION 


(Kendall and others, 1938). For values of n occurring in practice the correlation between 

р and т is thus very high. It also appears that the regression of p on т is approximately ` 
linear over the material part of the range, that is, unless both are very close to unity. In 

such a case, recalling the values of the variances of the two coefficients, we shall have 


= 1 т 18n(n — 1) 
А п —1 Aj 4Qn +5) 
~ 3t 
9? 
so that т will be about two-thirds of the value of p when n is large. 


Grades 


16.25. Up to this point we have considered the problem of rank correlation without 
reference to any variate system which might underlie the rankings. In certain classes of 
inquiry this is inevitable ; for example, we might shuffle a pack of cards and use the rank 
correlation between the orders before and after shuffling to measure the efficacy of the 
process of mixing. The early theory of rank correlation was, however, developed from 
rather a different view-point. The qualities considered were measurable, and always in 
theory (and often in practice) it was possible to find à product-moment coefficient of 
correlation. The use of Spearman's p was regarded as a substitute for such a coefficient, 
suitable either because the necessary measurements could not be carried out, whereas the 
ranking could, or because time was saved in working out rank correlations. 

It is not immediately evident what meaning can be attached to ranking in a continuous 
population, for the members thereof are not denumerable. 

The remark of 16.5 offers one way of overcoming the difficulty. The ranking of an 
individual as r can be regarded as a numerical statement to the effect that there are (r — 1) 
members “ above ” that individual, that is to say (r — 1) members who are given precedence. 
Quantities have already been considered in connection with continuous populations which 
express the same idea, namely, the quantiles. The pth decile, for example, is the variate- 
value such that p tenths of the total frequency lie below it. We will then define the grade 
of an individual as the proportion of the total frequency with a lower variate-value than 
that borne by that individual. If we have a discontinuous population N in number, the 
grade of an individual ranked according to the variate-values as r (from the lower to the 


Я : =m bers 
higher values) will be ш If the population is continuous its members cannot be 
ranked ; but if we choose a sample of n members and rank them, an estimate of the grade 
of the rth member may be obtained by assuming that one-half of that member is to be 
assigned to each of the ranges into which its variate-value divides the variate-range, so that 
its grade is then taken to be 
(tape чу 


n n 


16.26. For a continuous bivariate population there will be no rank VONT 
there will, in general, be a grade correlation. Consider the bivariate normal population 
whose frequency function is 

= 1 eM EO n (x2 — 9p'a = 
2—51 рф р 20 — p'?) py +y } ы . (16.41) 


GRADES 409 


where, to avoid confusion with Spearman’s p, we have denoted the product-moment 
coefficient by p'. 
Let 


t -f (ые z dæ dy = im F. тев . (16.42) 


= | I dieci Бы 


Then £ and у are the grades and if x and у are independent soarefand7. isa function 
of = and is distributed in the form 


EP OSERT ы О c A харв 


and similarly for у. Thus the mean and variance of both ¢ and 7 are $ and 4 respectively. 
For the Spearman coefficient between ё and 7 we may then take 


e-is[ Ir épzdzdy —3 . : е . (16.44) 


Temembering, however, that this is a generalisation of p to grades. From (16.44) we then 
ave 


dp _ =f | in &, de dy. 
Е не 9р 


dp’ 
Now logz = — aaa — 9p'ay + y?) — $ log (1 — p'?) — constant, 
Thus 1 д2 mee —p' -2 2p'a: зү UN gy р' 
ре d reto o LES 

_1 0* 

-z Ox ду 
and h dp, Poet woe 77 

епсе dp 12 nS Ех in a du da dy 


By а partial integration with respect to x this is equal to 


alén) д2 
apr dy [sz] le — a af, dx а 8 oy 


The first term vanishes and thus 


dp qu al othe 
е аа NT 
By a partial integration with pu to y we find 


On DË 
eu Je 4 $2 z dx dy, 


whence, from (16.42) 
dp е 
oe AN (2 — p'?)a® — 2p'ay + (2 — py? ай, 6 

de amr rR de ge a 20 — p”) SS 204 — p'3)* 


410 RANK CORRELATION 


Integrating we have, since p vanishes with р’, 


or у p = 2sin 7? > ae 20. + (16.45) 


16.27. This formula is due to K. Pearson, but its value is problematical. It represents 
the relationship between the product-moment and the grade correlations when the variates 
are normal. It has, however, been used to transform a rank correlation obtained from 
a small sample of n values into a putative product-moment coefficient in that sample, F 
even worse, in the population from which the sample is derived, whether normal or 20” , 
The reader may care to list for himself the assumptions made in adopting such a procedure 
and to reflect on their justification. We shall not notice the process again, but we may 


note that in no case is p very different from 2 sin 2 in numerical value. If p = 0-6, 


2 sin ES — 0:618, and this is about the greatest difference that can occur. 


16.28. Equation (16.45) has also been advocated as an easy, though perhaps 
inaccurate, method of calculating a product-moment coefficient. The idea is that when 
a set of bivariate values is given they shall be replaced by ranks, the rank coefficient 
calculated, and the value of p' derived from (16.45). Apart from the theoretical objections; 
such a procedure involves no saving of labour if the number of values is greater than 30 or 40. 
Various formulae have been offered for the standard error of an estimate of the parent 
produet-moment correlation based on (16.45). Some of those in current statistical text- 
books are incorrect, and it may be doubted whether the use of any one is justified. The 
reader may consult Eells (1929) for a list of these formulae. 


The Case of m Rankings 


16.29. We now consider the more general case in which there are m rankings ofn 
instead of two. Our problem is to discuss the general agreement among the set of m. 


m ; 
It is natural in the first instance to consider the average p or v in the | , possible 


-pairs which can be chosen from the set of m. For example, if we have three rankings of 
;six as follows : — 


e 
to 
w 
— 
on 
е 
аҥ в 


. (16.46) 


-the Spearman p's between PQ, QR and RP respectively are 11, — 12, — 19, so that the 
average p, SAY ра» is equal to — s = — 0-26. We shall consider a slightly differen? 
coefficient linearly related to pav 


Suppose we sum the ranks in the columns of (16.46), obtaining the numbers 


1l 8 8 14 11 117 


M 


THE CASE OF m RANKINGS 411 


macs) and reflect the degree of 


These numbers must sum to 63 (and in general to 
resemblance among the rankings. If the concordance were perfect the sums would be 
3, 6, 9, 12, 15, 18, though not necessarily, of course, in that order, and in such a case would 
be as different as possible. On the other hand, when there is little or no resemblance, as 
in the example given, the sums are approximately equal. It is thus natural to take the 
variance of these sums as providing a measure of the ranking concordance. 


m(n + 1) 


Let S be the sum of the squares of deviations from the mean If the con- 


m?(n? — п) 


15 Write then 


cordance is perfect the sums are m, 2m, . . . nm and the sum S is 


128 


= = = d à . (16.47) 
m?(n? — n) 


Then W may vary from 0 to ] and we shall call it the coefficient of concordance. In the 
above example it will be found that S = 25-5, W = 0-16. 


16.30. ТУ is connected with p,, by the relation 


mW —1 
Exc) И ER 5 CURE 
Pas — ——1 (16.48) 


In fact, if the rankings, measured from the mean 3(» + 1), ате 21; vis . . . Gins Var. + + Cons 
* 41... mn, the average p is 


1 ID NIN Э 
am Sy ans Dy te ЕЕ О) 


K, i=1 j=l 
12 n m 2 n m 
= m(m — l)(n3 — m) {> Ce y) E >) De 
jei Weil ј=1 i-1 
12 n? —7 
« m(m — 1)(%% — zfs E 55) 
ту —1 
т = 1° 


Pay is the intra-class correlation for the m sets of ranks considered as variate-values. It 
cannot be less than —— 1 


(m — 1) 


An ы ба s To p whether an observed value of W is significant it is necessary to consider 
ae T (95 more conveniently, of 5) in the population obtained by permuting 
nd 2n s D a riis p in each of the m rankings. No generality is lost in supposin, 
poe ing fixed and the others will then give rise to (n!)"— values of S. We wi B in 
е distributions for some low values of 7 r^ M. dcm 


values by the use of à continuous АА and show how to approximate for larger 


412 RANK CORRELATION 


For the case m = 2 the distribution of S is that of E in Table 16.1. The distributions 
have also been found for n = 3, m = 2 to 10; n = 4, m —2to6; and n = 5, т = 3. 
Tables 16.5 to 16.8 give the probabilities based on these distributions in a form analogous 
to Tables 16.2 and 16.4. 


TABLE 16.5 


Concordance Coefficient W. Probability that a given Value of S will be Attained or Exceeded 
for n =3 and Values of m from 2 to 10. 


Values of m 


s. | 2 3 4 5 6 | 1 8 9 10 
| | | 
0 | 1000 1-000 1-000 | 1-000 1-000 1-000 1-000 1-000 1-000 
2 0-833 0-944 0-931 0-954 0-956 0-964 0-967 0-971 0-974 
6 0-500 0-528 0-653 0-691 0-740 0-768 0-794 0-814 0-830 
8 0-167 0-361 0-431 0-522 0-570 0-620 0-654 0-685 0:710 
14 0-194 0-273 0-367 0-430 0-486 0-531 0-569 0-601 
18 0-028 0-125 0-182 0-252 0-305 0-355 0-398 0:436 
24 0-069 0-124 0-184 0-237 0-285 0-328 0-368 
26 0-042 0-093 0-142 0-192 0-236 0-278 0-316 
32 0:0046 | 0-039 0-072 0-112 0-149 0-187 0-222 
38 0-024 0-052 0-085 0-120 0-154 0-187 
42 0-0085 | 0-029 0-051 0-079 0-107 0-135 
50 0-077 | 0-012 0-027 0-047 0-069 0-092 
54 0-0081 | 0-021 0-038 0-057 0-078 
56 0-0055 | 0-016 0-030 0-048 0-066 
62 0-0017 | 0-0084 | 0-018 0-031 0-046 
72 0-0313 | 0-0036 | 0-0099 | 0-019 0-030 
74 0-0027 | 0-0080 | 0-016 0-026 
78 0-0012 | 0-0048 | 0-010 0-018 
86 0:0332 | 0-0024 | 0-0060 | 0-012 
96 | 09-0932 | 0-0011 | 0-0035 | 0-0075 
98 0-0421 | 0-0386 | 0-0029 | 0:0063 
104 0.0326 | 0-0013 | 0-0034 
114 0-0461 | 0:066 | 0-0020 
122 0-0461 | 0-0335 | 0-0013 
126 0:061 | 0-0320 | 0-0?83 
128 0-0536 | 0.0197 | 0.0351 
134 0-0454 | 0:037 
146 0-0411 | 0-0818 
150 0-0411 0.0311 
152 0-0411 0:0485 
158 0-0411 0:0444 
162 0-0660 | 0-0420 
168 0-0411 
182 0-0521 
200 0-0799 


oe 
— —— Ju 


Concordance Coefficient W. Probability that a given Valu 


THE CASE OF m RANKINGS 


TABLE 16.6 


for n =4and m= 3 and 5. 


413 


e of S will be Attained or Exceeded 


la 
S m=3 m=5 S m = 
| 
1 1-000 1-000 61 0-055 
3 0-958 0:975 65 0:044 
5 0-910 0-944 67 0-034 
9 0-727 0-851 69 0-031 
11 0-608 0-771 73 0-023 
13 0-524 0-709 75 0-020 
17 0-446 0-652 775 0-017 
19 0-342 0-561 81 0-012 
21 0-300 0-521 83 0-0087 
25 0-207 0-445 85, 0-0067 
27 0-175 0-408 89 0-0055 
29 0-148 0-372 91 0-0031 
33 0-075 0-298 93 0-0023 
35 0-054 0-260 97 0-0018 
37 0-033 0-226 99 0-0016 
41 0-017 0-210 101 0-0014 
43 0-0017 0-162 105 0-0364 
45 0-0017 0-141 107 0:0333 
49 0-123 109 0-0321 
51 0-107 113 0-0314 
53 0-093 117 0-0448 
57 0-075 125 0-0530 
59 0-067 
IL. 


414 | RANK CORRELATION 


TABLE 16.7 


Concordance Coefficient W. Probability that a given Value of S will be Attained or Exceeded 
for n = 4 and m = 2, 4 and 6. 


———À 

S m=2 m=4 m=6 S m=6 | 

0 1:000 1-000 1-000 82 0:035 

2 0-958 0-992 0-996 S4 0-032 

4 0-833 0-928 0-957 S6 0-029 

6 0-792 0-900 0-940 88 0-023 

8 0-625 0-800 0-874 90 0-022 
10 0-542 0-754 0-844 94 0-017 { 
12 0-458 0-677 0-789 96 0-014 ў 
14 0-375 I 0-649 0-772 98 0-013 
16 0-208 0-524 0-679 100 0.010 ^ 
18 0-107 0-508 0-668 102 0-0096 
20 0-042 ` 0:432 0-609 104 0-0085 
22 0-389 0-574 106 0-0073 
94 0-355 0-541 108 0-0061 
26 ' 0-324 0-512 110 0-0057 
30 0-242 0-431 114 0-0040 
32 0-200 0-386 116 0-0033 
34 0-190 0-375 118 0-0028 
36 0-158 0-338 120 0-0023 
38 . 0:141 0-317 122 0-0020 
40 0-105 0-270 126 0-0015 
42 0-094. 0-256 128 0.0290 
44 0-077 0-230 130 0-0387 
46 0-068 0-218 132 0:0373 
48 0-054 0-197 134 с 0:065 
50 0-052 0-194 136 0-0340 
52 0-036 0:163 138 0-0336 
54 0-033 0-155 140 0-0228 
56 0-019 0-127 144 0-0294 
58 0-014 0-114 146 0-0222 
62 0-012 0-108 148 0-0212 
64 0-0069 0-089 150 0-0195 
66 0-0062 0-088 152 0-0462 
68 0-0027 0-073 154 0-0446 
70 0-0027 0-066 158 0-0124 
72 0:0016 0-060 160 0-0416 
74 0-0394 0-056 162 0-0412 
76 0-0894 0-043 164 0-0580 
78 0-0394 0-041 170 0-0524 
80 0-0472 0-037 180 0-0613 


THE CASE OF m RANKINGS 415 
TABLE 16.8 


Concordance Coefficient W. Probability that a given Value of S will be Attained or Exceeded 
forn=5 and m = 3. 


S | m S т = 
0 1-000 44 0:236 
2 1-000 46 0-213 
4 0-988 48 0-172 
6 0-972 50 0-163 
8 0-941 52 0-127 
10 0-914 54 0-117 
12 0-845 56 0-096 
14 0-831 58 0-080 
16 0-768 60 0-063 
18 0-720 62 0-056 
20 0-682 64 0-045 
22 0-649 66 0-038 
24 0-595 68 0-028 
26 0-559 70 0:026 
28 0-493 12 0-017 
30 0-475 74 0:015 
32 0-432 76 0-0078 i 
34 0-406 78 0-0053 
36 0:347 80 0-0040 
38 0-326 82 0-0028 
40 0-291 86 0-0°90 
^ 42 0-253 90 з 0:0469 


a eee These distributions may be obtained by two methods. The first consists 
‘ аа up the distribution for (m -+ 1) and n from that for m and x. For example, 
th ‘3m = 2 and n = 3 we have the following values of the sums of ranks, measured about 
Пе’ mean :— 
ре Frequency 
1 


Qr ~ № 


2 

2 

5 1 

FN — 2, 1, 1, and 2, — 1, — 1 are taken to be identical types, for they give the same 
i of S and will also give similar types when we proceed to m — 3 as follows. 

E ч the case m = 3, each of the above type will appear added to the six permutations 
= 1,0,1; e.g. the type — 2, 0, 2 will give one each of — 3, 0,3; — 3,1,2; —2, — 1, 3; 


күз ы et One 
pe d ; — 1, 2; and — 1, 0, 1. These types are then counted 
ae Hid ioe okie т с yp ounted for each of the 


a ype Е progeny 
Ig 1 2 6 
| су 0 2 6 
) = ae! 6 
\ ои 15 
} иб? 2 


416 з RANK CORRELATION 


The case m = 4 is treated by considering the numbers of types obtained by adding 
the six permutations of — 1, 0, 1 to the types for m = 3; and so on. 

This method is quite convenient for n = 2 and n = 3. Fora = 4 it becomes difficult 
owing to the labour of considering 24 permutations at each stage and to the increase 1n 
the number of types. For n = 5 there are 120 permutations and the labour becomes 
excessive. 

The second method is a generalisation of the E-function of 16.12. For m rank- 
ings, the distribution of S is given by the expansion of an m-dimensional Z-function. 


For example, with m = 3 there would bea three-dimensional Z-function the bottom plane 
of which would be 


a f RE ihe coo ыа 
a a v «T 
А oer 2 ee} 2 (E ч 2 
a 4 а F oa. ME 
{кана {nes} [p 
a a Е а 


The plane above this would be 


үке ° [rem 2 
a e. 0 


(а-у 2 вааз a 
a Soo 07 


and so on. 
The #-function is difficult to handle in more than three dimensions, but for the two- 


and three-dimensional case it is manageable and was used to obtain the distribution of - 


S for n = 5 and т = 3. 


16.33. Wenow proceed to find the first four moments of the distribution of S. The 
method is similar to that used for p but is somewhat more complicated. 


Writing z; for the deviation from the mean EA of the jth member of the ith 
ranking, we have, as in 16.30, 


m Pav 
1 1, Ole Suse 
=. : "A i 0 
лла © px el ik (16.50) 
Write Rir = es ©) . . . . : i (16.51) 
"ES 


where 7, k can have all values from 1 to m and thus any term R 28 appears again as Rps 
in the sum ZR, Then the moments of W are derivable from those of the R’s, which 


T 


TS 


THE CASE OF m RANKINGS 417 
V aie 
in turn are derivable from those of Spearman's р. In fact, writing N = i 13 ?) we have 
from (16.18) and (16.19) 
E(fy) = 0, E(Rj?) = 0 
2 r2 1 
E(Ry2) = Nt. ont 01652) 


M [3(25n3 — 38n? — 35n + 72)] 
EIS em m 25n(n + (и —1» J 


We next require the moments of 


1 
= Ry 
Pav m(m — DS 2. "e 


Dy: 
in complications arise because in some cases the R’s are correlated among themselves. 
ny two R’s are independent, і.е. 


E(Riz Rim) = 0, s. ae NOE ENDS) 


unless of course i = 1, k =m. This may be seen by reference to (16.51), the x’s being 
independent, Similarly 


À ER, Rin Rap) = 0; 
9Xcept when we have a set with “ circular” suffixes such as 
EQ Ru Ru), Зь REND 


for in this case the a's cease to be independent. Similarly any four R’s are independent 
Unless they form a set such as 


Ri Ru Rim Rmi . . . ? . (16.55) 


н n n n 
ER, Ры Ry) = 93 Зы >, Te dp У^, 2 
а=1 B=1 y-1 


= E {AR pg? voy H E Cka (8 р + Lig ia) C7 Xp, Xj} 
= BLD?) Хах, Gig} + Elka Crp) (Хх Bg — Etin u)] X [E Tiy ху] 


= {E (Eha) — E(vy, Crp) } Eltin v)? 


A 1 1 oed 
E Е * nin — j^ "a —1 


We have 


1 
Е(раь) = mm — 1)N E(X(£)) 


i 
Е m(m — 1)N ZOQ)) 


Ss ы, Е 
A.S.— VOL. I. $ ; ° . . : . (16.57) 


EE 


418 RANK CORRELATION 
1 


Ly.) = m OIE 
(Pav ) m*(m = 1)2N? Е( Rix) 
1 A fe: 
= жш стуу 20 Bat + ® Вы В + УВ Bu) 4 
1 
Em E(2X R.,? 
an?*(m — тууз 2 Ra) 
2 N? 
= mm гуна "О" — 1) TEN 
E Dri ‚ (68) 
m(m—l1)n—1 ^" Ü d Я и 
1 
E(p,,5) =m — ewe = Rir)? 
1 
p m*(m RIS 1)3N2 E Xy Hy Ra), 
all other terms vanishing, x 


8m(m — 1)(m — 2) 
xi mm — 1)8N3 E(Riy, Ry Ry) 


a m . (16.59) 
m*(m — 1)? (n — 1)** i NC 


From these results we have, for the first three moments of W, 


1 (about 0) =i 2 ww ee! ie ИЕ 0) 
_ 2(m — 1) 6.01 
Ha = mm — 1) . ‹ + . Я (1 ) 


а = Am est) . (16.62) 

m*(n — 1) 

In a similar way—we omit the lengthy algebra—it may be shown that 
_ 24(m — 1) (25n* — 38n? — 35n + 72 

= m'(n — 1) 25(n9 — m) 


+ 900 —1)(m — 2) + 3(n + 3)(m 2)(m—3).  .  . (1609 


16.34. The distribution of W is evidently asymmetrical since д» > 0 unless m = 2. 
Consider then the possibility of approximating to the distribution by the Type I form 
1 
dF = Wsoü-WycdW, | o«w ‚ (16.64) 
В(р, а) <1 5 ( 

The first two moments of this are 


иу (about 0) = е2 


Dis ; (16.65) 


Ha = PI 
(р + а)? 4- q +1) 


THE CASE OF m RANKINGS '— 419 
Identifying the values of (16.60), (16.61) and (16.65), we find 
DE a ub 


PY _ Am — 1) 
(р Фр +9 + 1) т®—1) 


giving 


1 
2 (3 ъ= 
p = in —1) – = 


g = (т — De m à | 


It will be found that the third moment about the mean of the Type I form is 

S(m — 1)(m — 2) _ 8(m — 1)(m — 2h 2 ) 

m4(n — l)(mn + m — 2) m*(n — 1) m(n — 1) + 2 К 

80 that the third moments of the W-distribution and the Type I distribution are approxi- 
mately equal if m and n are not small. Similarly the fourth moments will be found to 
differ by a small quantity. We may therefore use the Type I distribution to approximate 
to that of Ту, Tt appears likely that as n, m —> co the distribution of W tends to the Type I 
form, but this has not been rigorously demonstrated. 


. (16.66) 


16.35. The significance of W can then be tested in the Type I distribution, namely, 


by the use of incomplete B-functions. More conveniently, we may transform (16.64) 
to the form 


гара 
(me? +») 2 
by the transformation 
—1)W 
scia SS 


2 
а= 


y. = (m — [o —1)— al 


and test in the z-distribution which has been tabulated. 
In making this test it is desirable, for low values of m and n, to make the usual correction 


a continuity by subtracting unity from S (equation (16.47) ) and increasing the divisor 
тз — 
a by 2. Let us examine the approximation of the test in some cases wherein 


the exact values are known from Tables 16.5 to 16.8. 


1 For n = 3, = 9, the 1 per cent. level is given approximately by S = 78 (Table 16.5). 
Ог such a value, with continuity corrections, 


apes „©З=з1) 0.2805 
[UCM 
12 E 
z = 0-979 


420 RANK CORRELATION 


By linear interpolation of reciprocals in Appendix Table 5 we should require, for complete 
agreement, a value of z equal to 0-954. 

Forn = 4, т = 6, the 1 per cent. point is approximately S = 100. v, = 8/3, Y» = 40/3. 
We have W = 0:5556, z = 0-916. From Table 16.5 we should require a value of 0-893. 


For n = 5, т = 3, there is no very convenient value of S close to the 1 per cent. point. 
For P = 0-015, S = 74, and for P = 0-0078, S = 76. 


For S — 74 (with continuity corrections) z — 1-020 
S=76(, 4 » ) z = 1-089. 


By interpolation from the tables z = 1:075. The use of the z-test would lead to the 
correct conclusion that a value of S equal to 74 falls below, and that of 76 above, the 
l per cent. point. 

For values of m andn not included in Tables 16.5 to 16.8 it thus appears that the z-test 
with continuity corrections will give sufficiently accurate results, if » is greater than 3, 
at the 1 per cent. points. It may be presumed that the results at the 5 per cent. points are 
equally good and probably better. But for finer values of significance, such as 0-1 per 
cent., it is doubtful whether the testis sound. The tails of the distribution of S for moderate 
values of m and n are very irregular. 


і 


16.36. А somewhat more approximate test of W has been given by Friedman ( 1937), 
who defined a statistic 


р? = m(n — 1)W (16.68) 


and showed that the distribution of 7,? tends to that of the Type III у? аз т tends to infinity, 
with (n — 1) degrees of freedom. This test appears reasonably satisfactory for moderate 


m and m, though not so accurate as ours. Friedman has also provided (1940) some 
significance levels of 7,? caleulated on the basis of the z-test. 


Example 16.5 

In some experiments in random series à pack of ordinary playing eards was shuffled 
and the order of the 13 cards of each suit from the top of the pack was noted. The pack 
was then reshuffled and again the orders noted. This was done 28 times. The question 
to discuss was whether the shuffling was good, in the sense that the cards were thoroughly 
mixed at each shuffle. 

Here, for each suit, say diamonds, we have 28 rankings of 13. The sums of ranks 
were 183, 137, 171, 207, 188, 160, 225, 174, 216, 192, 236, 239, 220. The mean is 196, 
and S = 11,522; W (without continuity corrections, which are not worth making for these 
values of m and n) = 0-08075, z = 0-432. This falls just beyond the 1 per cent. point. 

Similarly for the clubs W was found to be 0-0535 ; for the hearts, 0-0245 ; and for 
spades, 0-0342. None of these values is significant, and we conclude that the randomisatio? 
introduced by the shuffling was good, at all events, so far as this test was concerned. < 
may be added that the shuffling was done with much more care than would be taken in 
an ordinary game of cards. 


Example 16.6 | 

In psychological work there has sometimes been a confusion between the determin- 
ation of a measure of agreement between subjects and that of an objective order based 
on experimental rankings. It may therefore be as well to point out that EU psycho- 


ESTIMATION OF A TRUE RANKING 421 


logical applications the test of ТУ is one of concordance between judgments. There may 
be quite a high measure of agreement about something which is incorrect. 
A number of students were given 12 photographs of persons unknown to them, and 
pu to rank them in what they judged from the photographs to be their intelligence. 
9r 16 students the sums of ranks were 
; 112, 94, 101, S4, 97, 75, 104, 84, 102, 146, 125, 124. 
hae mean is 104. S = 4472, W = 0-1222. = = 0-368, and is barely significant, being 
etween the 1 per cent. and the 5 per cent. points. 
For 111 students the sums were 
818, 670, 908, 410, 706, 526, 780, 485, 596, 1044, 959, 756 
W = 0-2378, z = 1-768. 
iba highly significant and it is to be inferred that community of judgment exists between 
and n S or groups of students. But there was little relationship between the judgments 
e intelligence of the photographed subjects as given by the Binet Intelligence Quotient. 


Estimation of a True Ranking 
moon Suppose we have m sets of n rankings which show a significant concordance. 
s Iu that the relations between the rankings reflect the true ranking of the objects, 
E. € we to estimate that ranking ? or again, assuming merely a significant concordance 
een observers, what is the ranking “ nearest ” to their rankings ? 
obje а обаче approach to this problem would probably lead us to this solution: the 
is 2 is " hose true rank is 1 is that for which the sum of ranks is least ; that whose rank 
ewe ne one for which the sum of ranks is least but one; and so on. For example, if 
ja Es three rankings of five objects totalling 9, 7, 4, 10, 15, we should take the third 
аш 1, the second as 2, the first as 3, the fourth as 4 and the fifth as 5. 
Er solution can be given a firmer theoretical basis. It is the “ best” in a least- 
ae S sense. In fact, suppose the true ranking is X,, XQ... X,, where as usual the 
те à permutation of the first л integers. Suppose the sums of ranks are SEAS ao Ens 
Consider the sum 
TF al i U = X(S; — тХ,)?. . * D . . (16.69) 
Ss the rankings were correct, each S; would be mX;,, so that this quantity represents 
s га the divergence from complete agreement. Our “ best” estimate of the X’s 
€ given by minimising U. Now 
m U = (8,2) + m? Z(X;?) — 2m £ S; X, 
а twi ve dus IS Ор 
big eie eta i c Ыр y one by multiplying the 
nom Jy n, the next biggest by (n — 1), and so on. The result follows. 
ere is, of course, an indeterminacy in this method if any two of the S's are equal. 


Pai ч 
алтей Comparisons 


bas E ра the oe which are being ranked are known to be measurable according 
DEBUE Mi whic ив , no question as to the legitimacy of ranking arises. But cases 
arranging of human b y no means clear that ranking is legitimate, as for instance in the 
according to pref eings according to intelligence or of pieces of music by human beings 

preierence. To require an observer to carry out a ranking in such a case 


422 ; 3 RANK CORRELATION 


may be equivalent to asking him to arrange English towns in order of geographical position 
(which is two-dimensional) or a number of fruits according to taste (which is probably 
four-dimensional). The observer may attempt to comply in the full belief that he is doing 
something within his powers, but if the quality under consideration is not measurable 
on a linear scale the resulting ranking may fail to give either a real picture of his preferences 
or of the variation of the quality in the individuals. For example, in judgments 9 
intelligence, it is not impossible that the observer should judge A more intelligent than 
B, B than C, and C than A, if the individuals are presented for his consideration one pair 
at а time. The likelihood of this happening is obviously increased when we are dealing 
with tastes in music, eatables or film stars ; and in practice the event is not uncommon. 
Such “inconsistent ” preferences can never appear in ranking, for if A is preferred. to 
and B to C, then А must automatically be shown as preferred to C. 


16.39. We therefore consider a more general method of investigating preferences. 


With n objects, we shall suppose that each of the Ж possible pairs is presented to an 


observer and his preference of one member of the pair noted. If the object A is preferred ` 
to B we write A—>B or B«—A. The (3) preferences of a single observer may be терге 
sented in tabular form as shown in Table 16.9. Y 

In this table, which is shown for the six objects A to Ё, an entry of unity in column 
and row X means X— Y, and is thus accompanied by a complementary zero in row 
and column X. The diagonals are blocked out. For example, in the table, A—>B, А> 0, 
D—-A, etc. 


TABLE 16.9 


Tabular Representation of Paired Comparison Schema. 


B с D E Р 
1 1 0 1 1 
— 0 1 n 0 
1 = 1 1 1 
0 " 0 — 0 0 
0 0 1 T 1 
i 3 1 0 - 


The arrangement of the objects A to F in the row and column hea 


: dings is qui itrary- 
There are (n!)? ways of representing the same configuration of pre ings is quite arbi 


ferences in such a ta 


j 


PAIRED COMPARISONS І : 423 


according to the permutations of objects in row and column ; but in practice it is generally 
desirable to have the order in row and column the same, and even among the т! possible 
arrangements so given there are often practical considerations which determine one order 


88 more convenient than others. 


_ 16.40. Paired comparisons may also be represented geometrically by a method 
which can be illustrated for the case of the six objects as follows :— 


A B 


E D 


Fio. 16.2.—Geometrical Representation of the Scheme of Preferences of Table 16.9. 


We represent the six objects A to F by the six vertices of a regular hexagon and join 

the vertices in all possible ways by straight lines. If А—>В we draw an arrow on the line 

B pointing from A to B. The arrows shown on Fig. 16.2 correspond to the preferences 
Shown in Table 16.9. 


16.41. If anobserver makes preferences of type A—-B—-C—-A we say that the triad 
ABC is inconsistent. In the geometrical representation an inconsistent triad is shown 
by а triangle in which all the arrows go round in the same direction. We may thus speak 
of a * circular ” triad of preferences. In Fig. 16.2 the triads ACD, BEF and three others 
are circular, 

It is also possible to have inconsistent triads of greater extent; but any such circuit 
must contain at least two circular triads. Suppose, for instance, that ABCD is circular, 
at A—B—C—D-A. Theneither.A—-C or C—>A.  Inthefirst case ACD is circular, 
E ү весопа Авс. Similarly either ABD or BCD is circular. Thus the circular tetrad 
D. contain just two circular triads. On the other hand it is possible for a tetrad to 

аш circular triads without being itself circular. 
ps rule, if ABCDE is circular either ABC or ACDE is circular and either BCD or 
2 ud cam 1 the two tetrads are circular there must be at least three circular 
eee cok еты four, because ADE may be common to both). It is easy to see 
үн wack dere based on this configuration that there need not be more than three 

; and it is clear that there must be at least three. For if the tetrads are 


424 RANK CORRELATION 


not circular then ABC and BCD must be so and then either CDE is circular or ABCE 
is so, adding at least one more. 

Generally, it appears that a circular n-ad must contain at least (n — 2) circular triads ; 
but it may contain more, and the fact that an n-ad contains (n — 2) circular triads does 
not mean that it is itself circular. In discussing inconsistences, therefore, it seems best 
to confine attention to circular triads, which, so to speak, constitute the inconsistent elements 
of the configuration, and to ignore the more ambiguous criteria associated with circular 
polyads of greater extent. 


16.42. We now prove the following theorems :— 


(1) The maximum possible number of circular triads is == if n is odd and 
n? — 4n 


24 


if m is even; and the minimum number is zero. 


(2) These limits can always be attained by some configuration of preferences. 

Consider a polygon of the type shown in Fig. 16.2 with љ vertices. There will be 
(n — 1) lines emanating from each vertex. Let о, v, . . ., «, be the number of lines 
at the respective vertices on which the arrows leave the vertex, 


Then Dy (x) = (2) 
res 


Qn— 
and the mean value of о, is 


n 


= 1X3 
Define T= (« cu ) 


= Se КУГА ЕА (10.70) 


We now show that if the direction of a preference is altered and the effect is to increase 
the number of circular triads by d, T' is reduced by 2d; and conversely. Consider the 
preference A—>B. The only triads affected by altering this to B—-A are those containing 
the line AB. Suppose there are « preferences of type A—>X (including A—>B) and f pre- 
ferences of type BX. Then four possible types of triad arise: 


A—>X<-B, say p in number 


A«—X—-B, 
А-Х» В, which must number « — p — 1 
A«-—X-«—B, ,, » » В – р. 


When the preference A—-B is reversed the first two remain non-circular, The third 
becomes circular, the fourth ceases to be so. The reduction in the value of T' is 


a? — (a — 1)? + ££ — ( + 1)? 


= 2(x — В — 1) 
= 2d, say. 
The increase in the number of circular triads is 
а —p —1) -B-p =4—8 —? 


d. 


Vw 


^ 


> 


COEFFICIENT OF CONSISTENCE IN PAIRED COMPARISONS 425 


More generally, if as the result of reversing any number of preferences T is decreased 
by 2d, then d must be an integer and the number of circular triads must be increased by d. 
This clearly follows from the previous results, for the reversal of preferences can take place 
one at a time and the effect on T and the number of circular triads is cumulative. 

We now investigate the maximum and minimum values of T. It is clear from the 
definition that 7 is greatest when the «’s are the natural numbers 1, 2, . .  ? ; and this 


; В A : n3 — п 
is a possible case because it corresponds to ordinary ranking. Hence max. (T) = —у>у 

For the minimum value, consider the polygon 43, А. 2. s AS. Set up the prefer- 
ences 4,— A,> . . . Auc Clearly at any vertex this results in one arrow entering 


and one leaving the vertex, i.e. the contribution to « is unity at each vertex. Next set 
up the preferences ATA ÁAPÁM. + + This circuit may cither visit each vertex once, 
or not. In the latter case we proceed to an unvisited vertex and set up the preferences 
AA, А, 2n and so on. Again there willbe a unit contribution to all the «’s. 
We then set up the preferences Ау—>Ау—>А;—>,‚ ete., and so on; and in this way we 
Shall ultimately complete the preference scheme. 
If n is odd all the preferences described will consist of circular tours of the polygon, 


,— 1 
and thus the value of « for each vertex will be = 5—. Ifnis even, the last preference 


A,—>A,,,,, will not be a tour but will consist of the single line joining one vertex with the 
7 : В " ] х i n n 
symmetrically opposite vertex. Thus here will be : vertices for which а => and 3 


us s —2 E 
vertices for which а =“. In this case T = T 

' Now it is clear from the definition of T that it cannot be less than zero, or if n is even, 
be less than 1 The configuration just given shows that these minima are, in fact, attainable. 


: s; n3 — 
Thus T can vary from a maximum of 13 


2 


р, in 
to a minimum of zero or a Hence 


the maximum number of circular triads, being half the variation from maximum to minimum 
of T (the maximum of 7 corresponding to the ranking case in which there are no incon- 
. n9 — 4. Я n*—mn.. Р 
Sistences), is т if n is even and ay if n is odd. 

This establishes the two results enunciated at the beginning of this section. 


Coefficient of Consistence in Paired Comparisons 
16.43. Ifdis the number of circular triads in an observed configuration of preferences 
we define 


9. 
б=1— p n al 
Du "c EE 706) 
24d J 
= 1 — ——, n even 
n? — 4n 


and call £ the coefficient of consistence. If and only if it is unity, there are no inconsistences 
in the configuration, which may therefore be represented by a ranking. As С decreases to 
Zero E а as measured by the number of circular triads, increases. 

or example, in the configuration of Fig. 16.2 there are five circular triads, ABD 
ACD, AFD, AED and ВЕР. The maximum possible number is 8. Thus ¢ = 0-375. 1 


426 RANK CORRELATION 


С can also be interpreted in the light of Table 16.9. Suppose, in that table, we sum 
the rows. (The column sums are determined by the row sums and add no fresh information.) 
The sum of any row will be the «-number for that vertex in the polygon which corresponds 
to the object defining the row. T will then be the value of the sum of squares of deviations 


т —1 А E 
of row totals from the mean value ori that is to say, will be the variance of the row 


sums multiplied byn. 6 is thus a linear function of this variance ; but it cannot be tested 
in the 7?-distribution as if Table 16.9 were a contingency table, for the border cells are not 
independent or linearly dependent. 


16.44. If an individual observer produces a configuration of preferences which show 
inconsistence there are usually several explanations; he may be an incompetent judge, 
the objects may be so alike that consistent differentiation is not possible, or his attention 
may wander during the course of the experiment. We discuss these questions later. They 
are mentioned here to explain the motive for the next stage of the mathematics. With 
what probability can a value of 5 arise by chance if the observer allots his preferences at 
random with respect to the quality under consideration ? 


With n objects there are Y) possible configurations of preferences. We proceed to 


investigate the distribution of d in this population of a2) different members. The method 
consists of proceeding from the distribution for n to that for (n + 1). 

For n = 3 there are eight configurations, of which two give one circular triad and six 
no circular triads. Consider the effect of adding a new vertex D to the vertices ABC. 
Four cases arise : 

(1) D> all A, B, C. 

(2) D—- two of A, B, C. 

(3) D— one of A, B, C. 

(4) D—- none of A, B, C. 


The last two are symmetrical with the first two and need not be separately considered. 

Situation (1) arises in one way and clearly does not add any new circular triads other 
than those already existing in the configuration ABC. It therefore contributes six values 
d — 0 and two values d — 1. So does situation (4). . 

Situation (2) arises in three ways, according as D<—A, D, or C. The configurations 
во reached are similar and we may take any one, say D«—C, as the single preference. If 
A«—C then DAC is not circular and if B«—C then DBC is not circular. Оп the other hand 
A—-C and B—C will each produce a circular triad. We then have the cases 


No. of Circular 
Triads added. 


A<—C—>B 0 
A—>C—>B 1 
A<—C<—B 1 
A—>C<—B 2 


ust enumerated the direction of AB \ 
ith the third A—B gives no circular ` 
ds опе and A<—B adds none. 


We now consider AB. In the first two cases j 
does not matter and no circular triads are added. W: 
triad but A«—B adds one. With the fourth A—>B ad 


COEFFICIENT OF AGREEMENT 427 


Thus the number of circular triads occurring for these four cases is found to be 


No. of Circular 
Triads. Frequency. 
0 2 
1 2 + 
2 4 


We must multiply the frequency by three and by two to allow for similar symmetrical 
arrangements, and the final results are 


No. of Circular a! 
Triads. Frequency. 
3 24 
i 16 
2 24 
Lll ee 
TOTAL 64 


The principles of this method are clear enough and the work may be formalised by 
a number of conventions which we omit to saye space. In common with many similar 
combinatorial problems, however, troubles arise from the sheer number of possibilities and 
the difficulty of ensuring that nothing is overlooked. Up to the present the distribution 


. 9f d for n up to and including 7 is known. The frequencies and probabilities are given in 


Table 16.10. 


Paired Comparisons for m Observers: Coefficient of Agreement 

16.45. We now consider the investigation of similarities of judgments for m observers. 
Suppose that in a table of the form of Table 16.9 we enter a unit in the cell in row X and 
column Y whenever X—-Y and count the units in each cell. A cell may then contain 


any number from 0 to m. If the observers are in complete agreement there will be (3) cells 


mn CT n н 
containing the number m, the remaining (2) cells being zero. The agreement шау be 


complete even if there are inconsistences present. 
Suppose that the cell in row X and column Y contains the number y. Let 
b 5(?) ТОРНА (ТСО) 


2 


the summation extending over the n(n — 1) cells of the table (the diagonal cells being 
ignored). 2 is then the sum of the number of agreements between pairs of judges. Put 
25 


"e 


Л ees (gy) 


428 RANK CORRELATION 
TABLE 16.10 


Paired Comparisons. Frequency (f) of Values of d and Probability (P) that Values will 
be Attained or Exceeded. 


| 
n=2 n=3 n=4 | n=5 | n=6 | a 
Value | | 
ofd.| * | | 
fae lees ell fe Pl у ав Р f Е 
| | 
| | |. ! _| 
| | | | | 
2 | 1-000 190 | 1-000 720 | 1-000 5,040 | 1-000 


-000 24 
-250 


оо 
om 


00 2 
j25 120 | 0-883 960 | 0-978 8,400 | 0-998 

24 75 | 

240 | 0-531 | 2,880 | 0-880 33,600 | 0-983 


10 384,048 | 0-447 
11 244,160 | 0-268 
13 233,520 | 0-147 
13 72,240 | 0-036 
14 2,640 0-001 


2 = 8 = 64 [оза | — |32,768} — 2,007,152; = 


TOTAL 
| | | 


2 EPN E 5 . [n/m 
The maximum number of agreements, occurring if ( н) cells each contain m, is ( »( 9 


2 4 


and thus in the case of complete agreement, and only in this case, u = 1. The further we 
go from this case, as measured by agreements between pairs of observers, the smaller 


ka NEL 
u becomes. The minimum number of agreements occurs when each cell contains if 


2 


2 m + 1). F g r Per ; 
m is even or mos ifmisodd. Thatis, if mis even, the minimum number of agreements is 


Г 
Р 
4 


m 


2 р к = 1т(т — a(z) 


1 
+. H u = — в 
and in this case ЕЕ 


We ss, (LBA) 


When m is odd the minimum value of w is found to be 


1 | 


u = m~. 4 . . A E " (16.75) 


16.46. We shall call ш the Coefficient of Agreement. It is unity if and only if there 
is complete agreement in the comparisons. Its minimum value is not — 1 except when à 
m — 2. This, however, is to be expected in a measure of agreement, for there can be no 
guch thing as complete disagreement among three.or morg observers in paired comparisons. 


COEFFICIENT OF AGREEMENT 429 


If observer P differs in certain comparisons from observers Q and R, the two latter must 
agree on those comparisons. 
When m = 2, u reduces to 


SES li. . "JOH 


&nd X becomes twice the number of cases in which the two observers agree about a com- 
parison. w is thus a generalisation of a coefficient v. For general m, if the entries in the 
table were constrained to the ranking type, u would be the average intercorrelation t between 


Observers taken two at a time. 


16.47. In discussing the significance of и it is desirable to know whether the set of 
preferences which give rise to it could have arisen by chance if the preferences had been 
assigned at random with respect to the quality under consideration. The procedure which 
first suggests itself is a generalisation of the method used for the case of m rankings. That 
is to say, we sum the entries in the rows of the table and consider the variance of these 
entries. If the preferences are allotted at random we expect to find about equal numbers 
given to each object, and the variance will be low ; in other cases it will be higher. 

The difficulty about this suggestion is that it has not been found possible to ascertain 


n 
the distribution of the variance in the 2" () possible sets of preferences. The case m = 1, 
Corresponding to the distribution of d for inconsistences, is difficult enough to solve. For 
higher values of m no distributions are known except in trivial cases. 
A test can, however, be devised by using the coefficient u. Consider one cell in the 
table in row X and column Y and let it contain the number y. Then the corresponding 
cell in row Y and column X will contain m — y. Thus these two contribute to Y the amount 


У т — 
0 T ( 2 d 

Now, of the total ways in which the units can be distributed in the first cell there 
will be (””) in which y units occur. Consequently the distribution of Х in the cell and the 


У 
Corresponding cell is given by the expression 


1e 0 у (С) (Py 4 (079 +... + Fes 


2 


and since the distribution in other pairs of cells is independent if the preferences are allotted 
at random the distribution of X for the whole table is given by 


UD * $ » А . (16.78) 
where N — (e) 


16.48. The distributions have been worked out for the following values of m and n : 
m = 3, n —2 to 8; m —4, n = 2 to 6; т = 5, n=2 to 5; m ==6, n —2 to 4. 
Tables 16.11 to 16.14 give the probabilities based on these distributions, i.e. the probabilities 
that a given value of X will be attained or exceeded. 


430 RANK CORRELATION 


For constant n the distribution tends to the Type III form as m tends to infinity. 
In fact, for a single pair of related cells the variate-value corresponding to a frequency 


e is б m $ + И) which is a quadratic in y. Were the variate-value a linear function 


of y the distribution for the single cell would tend to normality in accordance with the 
well-known property of the binomial. The case of the quadratic value corresponds to 
a transformation of the variate of the type x? = y, and the transform of the normal form 
exp (— x?) dx becomes the Type III form exp(—y)y~'dy. Since the N cells are 
independent and the sum of variates in the same Type III form is also distributed in that 


N 
form, it follows that X is in the limit distributed as exp (— X) X? ^ dX except perhaps 
for some constants. Thus X or some multiple of it is distributed as 7?. 
For constant m the distribution tends to normality with increasing n. 


TABLE 16.11 


Agreement in Paired Comparisons. The Probability P that a Value of X will be Attained 
or Exceeded, for т = 3, n = 2 to 8. 


n т = 3 т = 4 n= 5 n=6 n=7 n=8 


о-о 
о 
xa 
a 
o 
- 
o 
e 
— 
© 
e 
- 
= 
e 
~ 
© 
© 
= 
Ф 
© 
Ф 
to 
© 
to 
e 
o 
Ф 
oo 
= 
СЧ 
КӘ 
Q 
Ф 
© 
~ 


72 | 0.0942 
74 | 0.0936 
76 | 0.01024. 
78 | 0.01113 
80 | 0.01248 
| 82 | 0-01412 


4d 


TABLE 16.12 


Agreement in Paired Comparisons. The Probability P that a Value of X will be Attained 
or Exceeded, for т = 4 and n = 2 to 6 (for n = 6 only Values beyond the 1 per cent. Point 


are given). 
" { | | 
n=2 | n-23 n=4 n=5 т = 5 п= 6 п = 6 
| | 
| | | 
sl elsi» | elie [= hee ке о CINE NE 
| $ | | | 
| { 
2 1-000 6 | 1-000 12 | 1:000 20 | 1.000 42 | 0.0048, 57 0-014 79 | 0.0542 
3 0-625 7 | 0-947 13 | 0-997 21 | 1-000 | 43 | 0-0030| 58 | 0-0092| 80 | 0.0528 
6 0-125 8 | 0-736 14 | 0-975 | 22 | 0:999 | 44 0:0017 | 59 | 0.0058, 81 | 0-0°98 
9 | 0:455 15 | 0-901 23 | 6-995 | 45 | 0:073 60 | 0-0037| 82 0:0°%15 
10 | 0.330 16 | 0-769 24 | 0-979 46 | 0.0341| 61 | 0-0022| 83 0-0°12 
11 0:277 17 | 0-632 25 | 0:942 | 47 0.0324| 62 | 0-0013| 84 0-010951 
12 | 0-137 18 | 0-524 26 | 0-882 48 | 0-0:90 | 63 | 0.0?76 | 86 0-02130 
14 | 0:043 19 | 0-410 27 | 0-805 49 | 0-0:37 | 64 | 0-0244 | 87 0-021117 
15 | 0-025 20 | 0-278 28 | 0-719 | 50 | 0-0?25| 65 | 0.0323 | 90 0-01328 
18 | 0.0020| 21 0-185 29 | 0-621 51 | 0.0593 | 66 | 0-0?13 
22 | 0-137 30 | 0-514 52 | 0-:0521 | 67 | 0-0172 
23 | 0-088 31 | 0-413 53 | 0-0517 | 68 | 00:36 
24 | 0-044 32 | 0:327 54 | 0.0974 | 69 | 0-018 
25 | 0:027 33 | 0.249 56 | 0.0766 | 70 | 0.0597 
'e6 | 0-019 | 34 | 0:179 | 57 | 00738 | 71 0:0547 
27 | 0-0079) 35 | 0:127 60 | 0-0°93 | 72 | 0-0520 
98 | 0-0030! 36 | 0-090 73 | 0.0510 
29 | 0-0025| 37 | 0-060 74 | 0-0651 
30 | 0-0011| 38 | 0-038 75 | 0.0918 
32 | 0.0316 | 39 | 0-024 76 | 0-0778 
33 | 0.01395 | 40 | 0-016 77 | 0-0744 
36 | 0.0538 | 41 | 0-0088 78 | 0-0715 
L. k, | 
TABLE 16.13 
of X will be Attained 


Agreement in Paired Comparisons. The Probability P that a Value 
or Exceeded, for т = 5 and n = 2 to б. 


n=2 n=3 n=4 n= 5 n=5 
Dy Р > P ON p i = D = P 
i 
4 1-000 ` 12 1:000 24 1-000 40 1:000 76 0-0450 
.9 0:375 14 0:756 26 0-940 42 0-991 78 0-016 
i0 0-063 16 0-390 28 0-762 44 0-945 80 0-0550 
18 0:207 30 0-538 46 0-843 82 0:0515 
20 0-103 32 0-353 48 0-698 84 0-0639 
| 22 0-030 34 0-208 50 0-537 86 0-0°10 
24 0-011 36 0-107 52 0-384 88 0-0723 
26 0-0039 38 0-053 54 0:254 90 0-0553 
30 0.0324 40 0-024 56 0-158 92 0-0812 
42 0-0093 58 0-092 94 0-0?14 
44 0-0036 60 0-050 96 0-010946 
46 0-0012 62 0-026 100 0-012901 
48 0-0336 64 0-012 
50 0.0212 66 0:0057 
52 0-0428 68 0-0025 
54 0:0554 70 0-0010 
| | 56 0-0518 72 0-0239 
| 60 0-0760 74 0-0?14 


432 RANK CORRELATION 


TABLE 16.14 


Agreement in Paired Comparisons. The Probability P that a Value of X will be Attained 
or Exceeded, јот m = 6 and т = 2 to 4. 


| 
n=2 n= 8 n=4 | n=4 n=4 
x P 23 P 2 P 2 P x p 
| 

6 1-000 18 1-000 36 1-000 55 0-043 74 0.0112 
7 0-688 19 0-969 37 0-999 56 0-029 75 0:0559 
10 0-219 20 0-832 38 0-991 | 57 0-020 16 0:0549 
15 0-031 21 0-626 39 0-959 | 58 0-016 77 0-0532 
22 0-523 40 0-896 | 59 0-011 80 0.0568 
23 0-468 | 41 0-822 60 0-0072 81 0-0817 
24 0-303 42 0-755 61 0-0049 82 0.0512 
26 0-180 43 0-669 62 0.0034 | 85 0-0734 
27 0-147 44 0-556 63 0-0025 | 90 0-0893 

28 0-088 45 0-466 64 0-0016 

29 0-061 46 0-409 05 0-0283 

30 0-040 41 | 0337 | 66 0-0?06 

31 0-034 48 0-257 | 67 0-0?48 

32 0-023 49 0-209 | 68 0-0326 

35 0-0062 50 0-175 69 0-0216 

36 0-0029 51 0-133 70 0-086 

37 0-0020 52 0-097 | 71 0-0*68 

40 0:058 53 0.073 | 72 0-0448 

| 45 0-0231 54 0057 | 73 0-0?16 


16.49. The first of these results suggests that the Type III distribution will provide 
an approximation to the distribution (16.78) when m is moderately large. We proceed to 
find the first four moments of (16.78). 

Tt is sufficient to find the first four moments of (16.77), those of (16.78) being obtainable 
therefrom in virtue of the relationships which connect cumulants of independent distributions. 

The rth moment of (16.77) about the origin is given by 


omy’, = (( 2), У vico 


since 2" is the total frequency. Thus we have 


m 


Әти = 25 (oe -mr + T ”) = (2) ar ("уне — mr)  . (16.80) 
T=0 


Sere ancl ав z(^) can be obtained by operating on the binomial (1 + x)” p times 
5 


by a, e.g. we find 


COEFFICIENT OF AGREEMENT 433 


and hence, substituting in (16.80), 


uit (D. s. v ж Ez CIO) 


с AES?) 


п 
| 
ba 
T x(” = — ES +17 аний E 


These are the moments of X. Those of u are obtained by dividing by an appropriate 
power of N A and it may be noted in particular that the mean of w is zero. 


16.50. ‘The first four moments of the Type III distribution 
dF = ke”? a^! dx 


g а 2а 344 + 2) 

ПОЗ О а 
Equating the second and third moments to those given by (16.82) we find 
Nm(m — 1) 
2(m — 2)** 


are 


q = 
ы ЖЕЕП ез 


_ _2 
= 


To make the first moments correspond we move the origin of the X-distribution a distance 


iN (2 зр to the right. We thus reach the approximation to the +-distribution, 


Coinciding in the first three moments, 


2r | Nm(m— 


dF = ke m-ap п | dx, 
т\т — 
where == 5 — 4N] mog 
 \2/m—2 


Р 4. 
or, transforming to the more usual у? form by putting 7? = 55, we ind hat 


ут" — 3 4 
{> wir Wo VIEPEO 


arum Nm(m — 1) Р 
== (m — 23): s k = . . . (16.85) 


is distributed as g? with 


degrees of freedom. 


The fourth moments of X and the 7? approximation differ by terms of-order N-! and 
m~! compared with their absolute values. 


A.S.—VOL. I. 
ЕЕ 


434 RANK CORRELATION 


16.51. It only remains to be seen how large m and n must be for this to provide 
a satisfactory approximation. 

Consider first the distributions for m = 3. When m = 8, N = 28, wo have, for the 
approximation, 42 distributed with 168 degrees of freedom. From Table 16.11 we se 
that for X = 54, Р = 0:011 and for X = 58, P = 0-0011. Applying a continuity correction 
by deducting unity from X we find for the у? approximation with 7? = 4 x 53, v — 168, 
P = 0:011, and with 72 = 4 x 57, P = 0 з i 


00114. The correspondence is very closes nn 
spite of the low value of m. 


For m —4, n—5, N = 10, the approximation gives 22 — 30 distributed with 
30 degrees of freedom. For X = 40 and 41, this gives, with continuity corrections of 0:5, 
half the variate-interval, y? = 49 and 51,» = 30. From the diagram at the end it is seen 
that these values lie one on either side of the 1 per cent. value ; and this is in accordance 
with the exact values of P, which are seen from Table 16.12 to be 0-016 and 0-0088. Similarly 
we find that the values of 2, 37 and 38, lie on either side of the 5 per cent. level, which i$ 
again in accordance with the exact values, P = 0-060 and 0-038. 

For m —6, n —4, N = 6, the approximation gives 2 — 33-75 distributed with 
11-25 degrees of freedom. For E = 59 and 60 the corresponding у? values are seen to 
lie on either side of the 1 per cent. point, which accords with the exact value of Table 16.14. 

We conclude that the 7? approximation provides an adequate test of significance for 
the values of m and n outside the range for which Tables 16.13 and 16.14 give exact values. 


Example 16.7 


A class of boys (ages 11 to 13 inclusive) were asked to state their preferences with 
respecb to certain school subjects. Each child was given a sheet on which were written 
the possible pairs of subjects and asked to underline the one preferred in each case. The 
results were as follows: 


21 boys, 13 school subjects. The preferences are shown in Table 16.15, which is in 
the form described in 16.39; e.g. there wero 18 boys who preferred Art to Religion. 
TABLE 16.15 


Preferences of 21 Boys in 13 Subjects. 
1 2s 30 Sh 8 9 10 11 12 13 TOTALS 


- 
1. Woodwork XL та DORT OLD 16 16 18 18 18 20 21 20 211 
2. Gymnastics п = 14 "12 18818 34 е 16 20 16 18 19 188 
3. Art 1 # eo OMe 10 16 18 16 16 17 16 19 160 
4. Scionce 6 9 11 — 11 12 15 14 13 ney aie aei iy 154 
5. History 6 8 7 10 — 14 11 12 14 15 13 14 16 140 
6. Geography Sabie Wl. uh ues Cdi EN: 16 15 17| 137 
Жул ОКС T ЫШ н te E 116 
8. Religion CN а е T rm 
9. English Literature 5 NEN. Le 58 10 9 a 10 18 13 15 | 106 
10. Commercial subjects $ Trog 6 В Р Ü s T 0 10 14 91 
11. Algebra ] "Bec v А er eh 10 13|) 82 
12. English Grammar 07 ЗИ п B 8 5 Spt JUN E T5 81 
13. Geometry 1 @' BF DEED 4 d Чак FR 
| - Toran | 1638 


3 
í 


COEFFICIENT OF AGREEMENT 435 
in which the objects are arranged in order of total 


The calculation of Е for this table, 
d by noting that X, as given by equation (16.72), 


number of preferences, may be shortene 
may be transformed into the form 


ж = дуз) — тд) + (709), 


of the table below the diagonal. Since 


where the summation now takes place over the half 
the other half there is a considerable 


the numbers in this half are smaller than those in 
saving in arithmetic. 


We find X = 9718 


and hence u = ; NAN 
GG) 


unt of agreement among the children, indicated by the 


There is thus a certain amo 
positive value of u. Is this significant ? 
We note first of all that this distribution of preferences could not have arisen by chance 
to any acceptable degree of probability. In fact, y? = 412-4 (equation 16.84)) and» = 90-7. 


The large value of » justifies the use of the normal approximation to the 7°-distribution 
and we find 4/(272) — (> — 1) = 15:3, a very improbable result on the hypothesis of 
a random allocation of preferences. 

The distribution of circular triads was as follows :— 


No. of Triads. Frequency. No. of Triads. Frequency. 
0 H 12 T 
1 1 17 3 
4 5 21 1 
6 2 25 1 
7 2 29 1 
8 1 39 1 
10 i 
TOTAL 21 


The total number of circular triads was 242 with a mean of 11-5. Only one boy was 
entirely consistent. On the other hand, for n = 13 the maximum number of circular 
triads is 91, with a mathematical expectation of 71-5. It is thus clear that, except perhaps 
for one boy, we cannot suppose that any boy allotted preferences at random. We are 
again led to conclude that the boys are genuinely capable of making distinctions, and that 
consistently on the whole. Half the boys have coefficients of consistence ¢ greater than 0:92 

We conclude that the boys can make preferences and that in their view the subjects 
are sufficiently different to enable a reasonably consistent set of preferences to be made 
So far as these data are concerned there would be no objection to the assumption tl at 
а scale of preferences can be set up. With this in mind, we can say that ae val D f 
СА indicates a cer tain amount of agreement, though not a strong one. bet Г E сы, 
to which subjects they prefer. > ween the boys as 


436 RANK CORRELATION 


NOTES AND REFERENCES 

Spearman has suggested another coefficient of rank correlation, viz. 
. 3Z|d| 

n?—]p 
but this “ footrule ” is unreliable as a measure of dependence—it cannot, for example; 
attain — 1. For earlier work on rank correlation see Spearman (1904 1906) K. Pearson 
(1907) and “Student” (1921). The distribution of p in the case б independence was 
given by Kendall and others (1939). Pitman (1937) had previously suggested that it 
could be approximately represented by the B-distribution. 

The coefficient t was suggested by Kendall in 1938. In practice p is probably more 
convenient. It is, however, remarkable that т is unique among correlation coefficients in 
depending only on linear processes, so that machines may be constructed to calculate it. 
Furthermore, т can be adapted to give partial rank correlation coefficients (Kendall, 1942). 

j The problem of m rankings was considered by Friedman in 1937 and by Babington 
Smith and Kendall and by Wallis in 1939. Friedman (1940) has reviewed this work an! 
provided some useful tables based on the Type I approximative distribution. Wallis has 
pointed out that the coefficient W is the ranking analogue of the correlation ratio. Kelley 
(Statistical Method) had considered p,, as a measure of concordance in rankings. 

For the further theory of rank correlation see my Rank Correlation Methods, 1948, 
Charles Griffin & Co., London. 


R=] 


REFERENCES 

Eells, W. C. (1929), ** Formulas for probable errors of coefficients of correlation,” J. Amer. 
Statist. Ass., 24, 170. 

Friedman, M. (1937), “ The use of ranks to avoid the assumption of normality implicit 
in the analysis of variance,” Jour. Amer. Statist. Ass., 32, 675. 

—— (1940), * A comparison of alternative tests of significance for the problem of m 

: rankings,” Ann. Math. Statist., 11, 86. 

Hotelling, H., and Pabst, M. R. (1936), * Rank correlation and tests of significance in- 
volving no assumption of normality,” Ann. Math. Statist., 7, 29. 

Kendall, M. G. (1938), * A new measure of rank correlation," Biometrika, 30, 81. 

——, M. G., Kendall, S. Е. H., and Babington Smith, B. (1939), “ The distribution of 
Spearman's coefficient of rank correlation in a universe in which all rankings 
occur an equal number of times," Biometrika, 30, 251. d 

— — and Babington Smith, B. (1939), “ The problem of m rankings,” Ann. Math. Statist., 
10, 275. 

— and Babington Smith, B. (1940), “ On the method of paired comparisons,” Biometrika, 

31, 324. - У е 

(1942), “ Partial Rank Correlation,” Biometrika, 32, 277. й l 

Pearson, К. (1907), “ On further methods of determining correlation,” Drapers Co. Memoirs, 
Biometric Series IV, London, Dulau & Co. н 

Pitman, Е. J. G. (1937), “ Significance tests which may be applied to samples from any 
populations; Part II. The correlation coefficient test, J - Roy. Statist. бирр» 
4, 225, and (1938) “ Part III. The analysis of variance, Biometrika, 29, 322. 

Spearman, C. (1904), “ The proof and measurement of association between two things 

Amer. J. Psychol., 15, 88. 
(1906), “ A footrule for measuring COIT 


elation," Brit. Jour. Psychol., 2, 89. 


=_= 


rs 


` 


м 


EXERCISES 437 


“ Student ” (1921), ** An experimental determination of the probable error of Dr. Spearman’s 
correlation coefficients,” Biometrika, 13, 263. 

Wallis, W. A. (1939), “ Тһе correlation ratio for ranked data," Jour. Amer. Statist. Ass., 
34, 533. 


EXERCISES 


16.1. Show that the coefficients of rank correlation p between the natural order 1, 
. . 10 and the following rankings are — 0-37 and + 0-45 respectively. 
7, 10, 4, 1, 6, 8, 9, 5, 2: 33 
105. 1, 2, 8) 4 б, 0,072 8, 9. 


Show that the corresponding values of т are — 0-24 and + 0:60. 


16.2. Defining 
р? = mn — 1) W 


show that approximately 7,? is distributed as д? in the Type III form with v =n—l 
degrees of freedom. (Friedman, 1937.) 


16.3. Show that 17 is the ratio of the sum of squares between columns and the total 
sum of squares (the rankings being regarded as arrayed one below the other) and hence 
that W is the square of the correlation ratio 7y, for such an array (the ranks being regarded 
as variate-values) The “sum of squares between columns ? means the sum of squares 


of deviations of column means from their mean. (Wallis, 1939.) 
16.4. Show that Spearman’s “ footrule ” 
35 |а| 
В =1 – = 
n? — 1 


can attain, but not exceed, ће value 1, and сап be as small as, but not smaller than, — 3. 


16.5. Verify formula (16.63). 


16.6. The following table shows the preferences of 25 girls in 11 school subjects. 


т 2 3 A b 6 % вази TOTALS 


1. Gymnastics — 10 19 iT 80 17 31 21 21 15 22 186 
2. Science ` IH — 12% 15 17 165 21 19 18S 16 17 165 
3. Art 6 18 — 16 її 18 10 17 16 19 16 147 
4. Domestie Science в 30 9 — 16 11 18 Jb 34 1l. 74 121 
5. History b 8 9 Ө — 14 18 12 19 J5 18 121 
6. Arithmetic 8 10 7 14 11 — 12 13 12 16 18 121 
7. Geography 4 4 15 12 7 19 — 144086 14 ns 112 
8. English Literature 4.6 8 10 d9 19 Ш = 15 5 14 105 
9. Religion a) ij 33 ВЕ ese 105 
10. Algebra п d$ ЧА m 9 Delo es 12 104 
ll. English Grammar 3 8 gr 1 m que Se se = 8s 

TOTAL 1375 


Show that the coefficient of agreement s is 0-082 ;- that this is signi 
coef 2; gnificant ; but that 
the girls are less alike in preferences than the boys of Example 16.7. | 


APPENDIX TABLES 


APPENDIX TABLE 1 E 


Normal Distribution. Frequency Function of the Normal Distribution at every Tenth of the І 
Standard Deviation, with First and Second Differences. The value of the central ordinate i 
at zero is 1/V 2n. 


| 
|o s ARS) Ы 2 ys. a=). 4°, 
| | — 
0:0 0-39894 199 . — 392 2.5 0-01753 395 + 79 
0-1 0-39695 591 — 374 2-6 0-01358 316 + 66 
0-2 0-39104 965 — 347 2-7 0-01042 250 + 53 
0:3 0:38139 1312 — 308 2-8 0-00792 197 + 45 \, 
0-4 0:36827 1620 : — 265 2-9 0-00595 152 + 36 2 
0-5 0-35207 1885 — 212 3-0 0-00443 116 + 27 
0:6 0-33322 2097 — 159 3-1 0:00327 89 + 23 ^ 
0-7 0-31225 2256 — 104 3-2 0-00238 66 +17 
0-8 0-28969 2360 — 52 ‚3-3 0-00172 49 + 13 
0-9 0-26609 2412 0 3:4 0-00123 | 36 + 10 
10 0-24197 2412 + 46 3:5 0-00087 | 26 + 7 
1-1 0-21785 2366 + 84 3-6 0-00061 19 + 6 
1-2 0:19419 . 2282 + 118 3-7 0-00042 13 + 4 
1:3 0:17137 2164 + 143 3:8 0-00029 9 + 2 
14 0-14973 2021 + 161 3:9 0-00020 + 8 
15 0-12952 1860 + 173 4:0 0-00013 4 — 
16 0-11092 1687 +177 41 0-00009 3 == 
17 0:09405 1510 + 177 42 0-00006 2 — 
1:8 0:07895 1333 | + 170 4-3 0-00004 2 ed 
1-9 0-06562 1163 + 162 44 0-00002 — — 
2-0 0-05399 1001 + 150 45 0-00002 — = 
2-1 0-04398 851 + 137 46 0-00001 — — 
2.2 0-03547 714 + 120 4-7 0-00001 — — 1 
2.3 0-02833 594 + 108 4:8 0-00000 — — 
2:4 | 0-02239 486 + 91 
| = 
Precision of Interpolation.—Owing to the magnitude of the second difforences, simple interpolation. 
near the beginning of the tablo may EE v sor up to 5 wd е fourth: place ; the use of second 
differences will bring this down to 1 or 2 in the last р laco, third differonces being small. Where third 
differences are greatest, in the neighbourhood of ж/е = 0-6, the error may be as lay, Ө t 
place unless the third difference is used. Бө as 3 in tho Јаз 


438 


APPENDIX TABLES 439 


APPENDIX TABLE 2 


Normal Distribution. The Distribution Function F of the Normal Distribution, tabulated at 
every Tenth of the Standard Deviation, with First and Second Differences. 


z, F. 49+). AX—). z, F. A+). (=). 
| 

0-0 0:50000 3983 40 2-5 0-99379 155 36 
0-1 0-53983 3943 78 2-6 0-99534. 119 28 
0.2 0-57926 3865 114 2-7 0-99653 91 22 
0:3 0-61791 3751 147 2.8 0-99744 69 17 
0-4 0-65542 3604 175 2-9 0-99813 52 14 
0-5 0-69146 3429 200 3-0 0:99865 38 10 
0-6 012515 3229 219 34 0-99903 28 7 
07 0.75804 3010 230 3-2 0-99931 21 1 
0-8 0-78814 2780 240 3-3 0:99952 14 3 
0-9 0-81594 2540 241 34 0-99966 11 4 
1-0 0-84134. 2299 239 3:5 0-99977 7 — 
11 0-86433 2060 233 3:6 0-99984 5 — 
1-2 0-88493 1827 223 3-7 0:99989 4 — 
1:3 0-90320 1604 209 3:8 0-99993 2 — 
1:4 0-91924 1395 194 3-9 0.99995 2 — 
Lë 0-93319 1201 178 4-0 0-99997 1 as 
16 0-94520 1023 159 41 0-99998 1 23 
LT 0-95543 864 143 42 0-99999 = = 
1:8 0:96407 721 124 ‚483 0-99999 — — 
1:9 0:97128 597 108 44 0-99999 — = 
2-0 0:97725 489 93 

2-1 0-98214 396 78 

2-2 0-98610 318 66 

2:3 0-98928 252 53 

2.4 0-99180 199 44 


F attains the exact value 0-99999 between 4-26 and 4-27. 


Precision of Interpolation.—Simple interpolation may lead to an error of 3 or 4 at most in the fourth 
place of decimals in the region where second differences are large; the use of the second difference will 
bring this down to 2 or 3 in the last place, the largest errors tending to occur at the beginning of the 
table, whore tho third differenco may be used if the greatest possible precision is desired. 


t-Table. The Distribution Function of y= 


2\ +1 
e 
v 


APPENDIX 


(Condensed to three figures from the four-figure 


= i 
5. 6. cH 8. 9. 10. 
0-500 0-500 0 0-500 0-500 
0-538 0-538 9 0-539 0:539 
0-576 0-576 0:577 0-577 0-577 
0:613 0-614 0-614 0-6145 0-615 
0-6485 0-6495 0-650 0-651 0-651 
0-683 0-684 0-685 0-6855 0-686 
0-715 0-716 0-717 0-718 0-719 
0-745 0-747 0-748 0-749 0-750 
0-773 0-775 0- 0-772 0:779 
0-799 0-801 0- 0-804 0:805 
0-822 0-825 0- 0-828 0-830 
0-843 0-846 0- 0-850 0-851 
0-862 0-865 0-86 0-870 0-871 
0-879 0-883 0- 0-887 0-889 
0-890 0-8945 | 0-898 0- 0-9025 | 0-904 
0:903 0-908 0-911 0- 0-916 0-918 
0-915 0-920 0-923 0- 0-928 0-930 
0-925 0-930 0-9335 0- 0-938 0-940 
0-934 0-939 0:943 0- 0-947 0-949 
0:949 0-947 0-950 0-97 0:955 0:957 
0:949 0-954 0-957 0- 0-962 0-963 
0-955 0-960 0-963 0- 0-967 0-969 
0-9605 0-965 0-968 0- 0:972 0:974 
0-965 0-969 0-9725 0- 0-9765 0-978 
0-969 0-973 0-976 0-4 0-980 0-981 
0-973 0-977 0-9795 | 0-9815 | 0-983 0-984 
0-976 0-980 0-982 0-984 0-986 0-987 
0-979 0-982 0-985 0-9865 | 0-088 0-989 
0-981 0-984 0-987 0-988 0-990 0-991 
0-983 0-986 0-9885 0-990 0-991 0-992 
0-985 0-988 0-990 0-9915 0:9925 0:993 
0-987 0-989 0-991 0-993 0-994 0-994 
0-988 0-991 0-9925 0-994 0-995 0-995 
0-989 0-992 0-993 0-995 0-995 0-996 
0-990 0-993 0-994 0-995 0-996 0-997 
0-991 0-994 0-995 0-996 0-997 0:997 
0:992 0-994 0-996 0-9965 | 0-997 0:998 
0-993 0-995 0-996 0-997 0-9975 | 0-998 
0-994 0-9955 | 0-997 0-997 0-998 0-998 
0-994 0-996 0-997 0-998 0-998 0-998* 
0-995 0-996 0-997 0-998 0-998 0-999 
0-995 0-997 0-998 0-998 0-999 0-999 
0-996 0-997 0-998 0-9985 | 0-999 0-999 
0-996 0-9975 | 0-998 0-999 0-999 0-999 
0-9965 | 0-998 0-998 0-999 0-999 0-999 
0-997 0-998 0-999 0-999 0-999 0-999 
0-997 0-998 0-999 0-999 0-999 0-9995 
0-997 0-998 0-999 0-999 0-999 1-000 
0-998 0-9985 0-999 0-999 0-9995 
0-998 0-999 0-999 0-999 1-000 
0-998 0-999 0-999 0-9995 
0-998 0-999 0-999 0-9995 
0-998 0-999 0-999 1-000 
0-998 0-999 0-999 
0-9985 | 0-999 0-9995 
0-999 0-999 0-9995 
0-999 0-999 1-000 
0-999 0-999 
0-999 0-999 
0-999 0-9995 
0-999 0-9995 
440 <a 


i 


TABLE 3 


proceeding by Intervals of 0-1 from 0 to 6, and for Values of v from 1 to 20. 


tables by “ Student” in Metron, 5, 1925.) 


t. п. 12. 13. 14. 15. 16. 17. 18. 19. 20. 
0 0 | 0-500 0-500 | 0-500 | 0-500 | 0-500 | 0-500 
01 9 | 0-539 0-539 | 0-539 | 0-539 | 0-539 | 0-539 
0-2 0:577 | 0-578 0-578 | 0-578 | 0578 | 0-578 | 0-578 
0-3 0-615 | 0-615 0-016 | 0-616 | 0-616 | 0-616 | 0-616 
0-4 0-652 | 0-052 0-652 | 0-653 | 0-653 | 0-653 | 0-653 
0-5 0-6865 | 0-687 0-688 | 0-688 | 0-688 | 0-688 | 0-688 
0-6 0:720 | 0-720 0-721 | 0-721 | 0-7215 | 0-722 | 0-722 
0-7 0-751 | 0-751 0-752 | 0-753 | 0-753 | 0-753 | 0-754 
0-8 0-780 | 0-780 0-7815 | 0-782 | 0-782 | 0-783 | 0-783 
0-9 0-806 | 0-807 0:808 | 0-809 | 0-809 | 0-810 | 0-810 
10 0-831 | 0-8315 0-833 | 0-833 | 0-834 | 0-834 | 0-835 
1-1 0-853 | 0-8535 0-855 | 0-856 | 0-856 | 0-857 | 0-857 
13 0:872 | 0-873 0:875 | 0-876 | 0-876 | 0-877 | 0-877 
13 0-890 | 0-891 0-893 | 0-893 | 0-894 | 0-S945 | 0-395 
14 0-005* | 0-907 0-908 | 0-909 | 0-910 | 0-910 | 0-911 
1-5 0-919 | 0-920 0-922 | 0-993 | 0-9235 | 0-924 | 0-9245 
16 0-931 | 0-932 0-934 | 0:935 | 0-935 | 0-936 | 0-9365 
17 0-941 | 0-943 0-944 | 0-945 | 0-046 | 0-946 | 0-947 
18 0-050 | 09515 0:953 | 0-954 | 0:955 | 0-955 | 0-956 
1-9 0-058 | 0-959 0-961 | 0-962 | 0-962 | 0-963 | 0-963 
2.0 0-065 | 0-966 0:967 | 0-968 | 0-969 | 0-969 | 0-970 
21 0:970 | 0-971 0:973 | 0:9735 | 0-974 | 0-9745 | 0-975 
9.9 0-975 | 0-976 0:977 | 0-978 | 0-979 | 0-979 | 0-979 
2.3 0:979 | 0-980 0-981 | 0-982 | 0-982 | 0-983 | 0-983 
24. 0-082 | 0-983 0-085 | 0-985 | 0-9855 | 0-986 | 0-986 
2-5 0-985 | 0-986 0-987 | 0-988 | 0-988 | 0-9885 | 0-989 
2-6 0-988 | 0-988 0-0895 | 0-990 | 0-990 | 0-991 | 0-991 
9.7 0:990 | 0-990 0-991 | 0-992 | 0-992 | 0-992 | 0-993 
9.8 0:991 | 0-992 0-993 | 0-993 | 0-994 | 0-094 | 0-994 
2.9 0-993 | 0-993 0-004 | 0-9945 | 0-9945 | 0-995 | 0-995 
3:0 0-994 | 0-9945 0-995 | 0-9955 | 0-996 | 0-996 | 0-996 
3-1 0-995 | 0-995 0:996 | 0-996 | 0-997 | 0-997 | 0-997 
3.2 0:996 | 0-996 0:997 | 0-997 | 0-997 | 0-997 | 0-9975 
3:3 0-9965 | 0-997 0:997 | 0-998 | 0-998 | 0-998 | 0-998 
34 0:997 | 0-997 0-998 | 0-998 | 0-998 | 0-998 | 0-998 
3-5 0-9975 | 0-998 0-998 | 0-998 | 09985 | 0-999 | 0-999 
3-6 0-998 | 0-998 0-099 | 0-999 | 0-999 | 0-999 | 0-999 
377 0:998 | 0-9985 0-999 | 0-999 | 0-999 | 0-999 | 0-999 
3-8 0-9985 | 0-999 0.999 | 0-999 | 0-999 | 0-099 | 0-999 
39 0-999 | 0-999 0.999 | 0-999 | 0-999 | 0-999 | 0-9995 
4-0 0-999 | 0-999 0-999 | 0-999 | 0-9995 | 0-9995 | 1-000 
41 0-999 | 0-999 0.9995 | 0-9995 | 1:000 | 1-000 | 
42 0-999 | 0-099 1-000 | 1-000 
43 0-999 | 0-9995 
44 0:9995 | 1:000 
45 0:9995 
4-6 1-000 | 

| | 


Note.—The methods by which ‘ Student” calculated the Metron tables are explained in notes by him 
and R. A. Fisher in that journal, vol. 5, Part 3, 1925, pp. 18-24. The four figures of those values have been 
rounded up to three in the above table, except when the four-figure value concluded with a 5, in which case 
it is shown in full. In columns in which values greater than 0:9995 occur the first is written 1-000 and the 
remainder left blank. 


441 


442 APPENDIX TABLES 


APPENDIX TABLE 4 


(Reprinted from Table VI of Prof. R. A. Fisher’s Statistical Methods for Research Workers, 
Oliver and Boyd, Ltd., Edinburgh, by kind permission of the author and the publishers.) 


5 PER CENT. POINTS or THE DISTRIBUTION OF z. 


Values of »,. 


1 2 3 4 5 6 8. 12. | 94 © 
1 | 2-5421 | 2.6479 | 2 6870 | 2-7071 | 2-7194 2-7276 | 2-7380 | 2-7484 | 2.7588 2-1693 
2 | 1-4592 | 1.4722 | 1 4765 | 1-4787 | 1-4800 | 1:4808 | 1.4819 | 1-4830 | 1-4840 1-4851 
3 | 1-1577 | 1-1284 | 1-1137 | 1-1051 | 1.0994 | 1-0953 | 1.0399 | 1.0842 | 10731 | 1.0716 
4 | 1:0212 | 0-9690 | 0-9429 | 0-9272 | 0-9168 | 0-9093 | 0-8993 | 0.8885 | 0.8767 | 0.8639 
5 | 0-9441 | 0:8777 | 0-8441 | 0-8236 | 0-8097 | 0-7997 | 0-7862 | 0.7714 | 0.7550 | 0.7308 
6 | 0:8948 | 0-8188 | 0-798 | 0-7558 | 0-7394 | 0-7274 | 0-7112 | 0.6931 | 0.6729 | 0.6199 
7 | 08606 | 0-7777 | 0-7347 | 0-7080 | 0-6896 | 0-6761 | 0-6576 | 0-6369 | 0.6134 | 0.5802 
8 | 0:8355 | 0:7475 | 0-7014 | 0:6725 | 0-6525 | 0-6378 | 0-6175 | 0:5945 | 0.5682 | 0.5371 
9 | 0:8163 | 0:7242 | 0-6757 | 0-6450 | 0-6238 | 0-6080 | 0.5862 | 0-5613 | 0.5324 | Ойто 
10 | 0:8012 | 0:7058 | 0-6553 | 0-6232 | 0-6009 | 0:5843 | 0-5611 | 0-5346 | 0.5035 | odono 
11 | 0-7889 | 0:6909 | 0-6387 | 0.0055 | 0-5822 | 0.5648 | 0-5406 | 0-5126 | 0-4795 | 0.4387 
12 | 0:7788 | 0:6786 | 0-6250 | 0-5907 | 0-5666 | 0-5487 | 0.5234 | 0.4941 | 0.4592 | 0.4156 
18 | 0:7703 | 0:6682 | 0:6134 | 0-5783 | 0-5535 | 0-5350 | 0-5089 | 0-4785 | 0.4419 03957 
| 14 | 0:7630 | 0-6594 | 0-6036 | 0-5677 | 0-5423 | 0-5233 | 0-4964 | 0.4649 | 0.4269 | 0.3282 
15 | 0:7568 | 0-6518 | 0-5950 | 0-5585 | 0:5326 | 0.5131 | 0.4855 | 0.4532 | 0.4138 | onang 


5811 | 0-5434 | 0-5166 | 0-4964 | 0-4676 | 0-4337 0:3919 | 0-3366 
‘5753 | 0-5371 | 0-5099 | 0-4894 | 0-4602 | 0-4255 0:3827 | 0.3253 


Values of »,. 


"5701 | 0-5315 | 0-5040 | 0-4832 | 0-4535 | 
“5654 | 0-5265 | 0-4986 | 0-4776 | 0-4474 | 0-4116 | 0-3668 0-3057 


= 
А 
E 
со 
б 
= 
©з 
СЄ] 
ч 
© 
= 
oo 
Еч 
e 
E 


:5612 | 0-5219 | 0-4938 | 0-4725 | 0-4420 | 
‘5574 | 0-5178 | 0-4894 | 0-4679 | 0-4370 | 0-4001 | 0-3536 | 0-2892 
23 | 0-7269 | 0-6151 | 0-5540 | 0-5140 | 0-4854 | 0-4636 | 0-4325 | 0-3950 | 0-3478 | 0.2818 

+5508 | 0-5106 | 0-4817 | 0-4598 | 0-4283 | 0-3904 | 0-3425 | 0.2749 
25 | 0:7225 | 0.6097 | 0-5478 | 0-5074 | 0-4783 | 0-4562 | 0-4244 | 0.3862 | 0-3376 | 0.2685 
26 | 0-7205 | 0-6073 | 0-5451 | 0-5045 | 0-4752 | 0-4529 | 0-4209 
27 | 0-7187 | 0-6051 | 0-5427 | 0-5017 | 0-4723 | 0-4499 | 0-4176 | 0-3786 | 0-3287 | 0.2509 
28 | 0-7171 | 0-6030 | 0-5403 | 0-4992 | 0-4696 | 0-4471 | 0-4146 | 0-3752 | 0-3248 | 0.2516 
29 | 0:7155 | 0-6011 | 0-5382 | 0:4969 | 0-4671 | 0-4444 | 0-4117 | 0-3720 | 0-3211 | 0.2466 
30 | 0/7141 | 0:5994 | 0-5362 | 0-4947 | 0-4648 | 0-4420 | 0-4090 | 0-3691 | 0-3176 | 0.2419 


0 
0 
0 
0 
0 
16 | 0-7514 | 0-6451 | 0-5876 | 0-5505 0:5241 | 0-5042 | 0-4760 | 0-4428 | 0-4022 0-3490 
0 
0 
0 
0 
0 
0 


= 
oo 
EA 
re] 
p 
= 
©з 
© 
es 
© 
= 
о 
© 
to 
© 


60 | 0-6933 | 0-5738 | 0-5073 | 0-4632 | 0-4311 | 0-4064 0-3702 0-3255 | 0-2654 


c» | 0:6729 | 0:5486 | 0-4787 | 04319 | 0-3974 | 0:3706 | 0-3309 | 0.2804 0.2085 | o 


f 
| 
) 
| 


APPENDIX TABLES 443 


APPENDIX TABLE 5 


(Reprinted from Table VI of Prof. R. A. Fisher’s Statistical Methods for Research Workers, 
Oliver and Boyd, Edinburgh, by kind permission of the author and the publishers.) 


1 PER CENT. POINTS oF THE DISTRIBUTION OF Е. 


Values of 7. 


J 2 3 | 4 5 6. 8 12. 24. 
| | | | 
4.1535 | 4-2585 | 4-2974 | 4-3175 | 4-3297 | 4-3379 | 4-3482 | 4-3585 | 4-3689 
2.2950 | 2-2976 | 2-2984 | 2-2988 | 2-2991 | 2-2992 | 2.2994 | 2-2997 | 2.2999 
17649 | 1-7140 | 1-6915 | 1-6786 | 1.6703 | 1-6645 | 1-6569 | 1-6489 | 1-6404 
1:5970 | 1-4452 | 1-4075 | 1-3856 | 1-3711 | 1-3609 | 1-3473 | 1-3327 | 1-3170 
12929 | 1-2449 | 1-2164 | 1-1974 | 1-1838 | 1-1656 | 1-1457 | 1-1239 


13103 | 1-1955 | 1-1401 | 1-1068 | 1-0843 | 1-0680 | 1-0460 | 1.0218 | 0.9948 
123526 | 1-1281 | 1-0672 | 1-0300 | 1-0048 | 0-9864 | 0-9614 | 0-9335 | 0-9020 
1-2106 | 1:0787 | 10135 | 0:9734 | 0-9459 | 0.9259 | 0-8983 | 0-8673 | 0:8319 
11780 | 1-0411 | 0-9724 | 0-9299 | 0-9006 | 0-8791 | 0-8494 | 0-8157 | 0-7769 
10 | 11535 | L-0114 | 0.9399 | 0-8954 | 0-8646 | 0-8419 | 0-8104 | 0.7744 | 0.7324 


OHARA 
= 
w 
о 
К 
© 


11 | 1-1333 | 0-9874 | 0-9136 | 0-8674 | 0-8354 | 0-8116 | 0-7785 | 0.7405 | 0-6958 
12 | 1-1166 | 0-9677 | 0-8919 | 0-8443 | 0-8111 | 0-7864 | 0-7520 | 0-7122 | 0-6649 
13 | 1-1027 | 0-9511 | 0-8737 | 0-8248 | 0-7907 | 0-7652 | 0-7295 | 0-6882 | 0-6386 
14 | 1-0909 | 0-9370 | 0-8581 | 0-8082 | 0-7732 | 0-7471 | 0-7103 | 0-6675 | 0-6159 
15 | 1-0807 | 0-9249 | 0-8448 | 0.7939 | 0-7582 | 0-7314 | 0-6937 | 0-6496 | 0-5961 
16 | 1-0719 | 0-9144 | 0-8331 | 0.7814 | 0-7450 | 0-7177 | 0-6791 | 0-6339 | 0.5786 
0-9051 | 0-8229 | 0.7705 | 0-7335 | 0-7057 | 0-6663 | 0-6199 | 0-5630 
18 | 1-0572 | 0-8970 | 0-8138 | 0-7607 | 0-7232 | 0-6950 | 0-6549 | 0.6075 | 0-5491 
19 | 1-0511 | 0-8897 | 0-8057 | 0-7521 | 0-7140 | 0-6854 | 0-6447 | 0-5964 | 0-5366 
20 | 1.0457 | 0-8831 | 0-7985 | 0-7443 | 0-7058 | 0-6768 | 0-6355 | 0-5564 | 0-5253 


Values of v} 
= 
= 
E 
о 
с 
rg 
= 


| 
21 | 1.0408 | 0-8772 | 0-7920 | 0-7372 | 0-6984 | 0-6690 | 0-6272 | 0:5773 | 0-5150 
99 | 1-0363 | 0-8719 | 0-7860 | 0.7309 | 0-6916 | 0-6620 | 0-6196 | 0-5691 | 0-5056 
23 | 1-0322 | 0-8670 | 0-7806 | 0-7251 | 0-6855 | 0-6555 | 0.6127 | 0-5615 | 0-4969 
24 | 1.0285 | 0-8626 | 0-7757 | 0-7197 | 0-6799 | 0-6496 | 0-6064 | 0-5545 | 0-4890 
95 | 1-0251 | 0-8585 | 0-7712 | 0-7148 | 0-6747 | 0-6442 | 0-6006 | 0-5481 | 0-4816 
26 | 1-0220 | 0-8548 | 0-7670 | 0-7103 | 0-6699 | 0-6392 | 0-5952 | 0-5422 | 0-4748 
27 | 1-0191 | 0-8513 | 0-7631 | 0-7062 | 0-6655 | 0-6346 | 0-5902 | 0-5367 | 0-4685 
98 | 1-0164 | 0-8481 | 0-7595 | 0.7023 | 0-6614 | 0-6303 | 0-5856 | 0.5316 | 0-4626 
29 | 1.0139 | 0-8451 | 0-7562 | 0-6987 | 0-6576 | 0-6263 | 0-5813 | 0-5269 | 0-4570 
30 | 1-0116 | 0-8423 | 0-7531 | 0-6954 | 0-6540 | 0.6226 | 0-5773 | 0.5224 | 0-4519 


60 | 0-9784 | 0-8025 | 0-7086 | 0-6472 | 0-6028 | 0-5687 | 0-5189 | 0-4574 | 0:3746 


co | 0-9462 | 0.7636 | 0-6651 0-5999 | 0-5522 | 0-5152 | 0-4604 | 0-3908 | 0-2913 


444 APPENDIX TABLES 


APPENDIX TABLE 6 


Distribution Function of у? for One Degree of Freedom for Values of у? from у? = 0 to 
х? = 1 by steps of 0-01. 


P Р 4 z? | P A 
0 1-00000 7966 0-50 0-47950 436 
0-01 0-92034 3280 0-51 0:47514 430 
' 0-02 0-88754 2505 0-52 0-47084 423 | 
0-03 0-86249 2101 0-53 0-46661 418 | 
o e 1842 0-54 0-46243 411 
05 -823 1656 0-55 0-45832 406 
0-06 0-80650 1516 0-56 0-45426 400 
0-07 0-79134 1404 0-57 0-45026 395 
0-08 4 — 077730 1312 0-58 0-44631 389 
0-09 0-76418 1235 0-59 0-44242 384 
0-10 0-75183 1169 0-60 0-43858 379 
0-11 074014 1111 0-61 0-43479 374 h 
0-12 0-72903 1060 0-62 0-43105 369 x 
0-13 0-71843 1015 0-63 0-42736 365 
0:14 0:70828 974 0-64 . 0-42371 360 
0-15 0-69854 938 0-65 0-42011 355 | 
0-16 0-68916 905 0-66 0-410656 351 
047 0-68011 874 0-67 0-41305 346 
0-18 0-67137 845 0-68 0-140959 343 
0-19 0-66292 820 0-69 0-40616 338 
0-20 0-65472 795 0-70 0-40278 334 
0-21 0-64677 773 0-71 0-39944 330 
0-92 0-63904 752 0-72 0-39614 326 
0-23 0-63152 731 0-73 0-39288 322 
0-24 0-62421 713 0-74 0:38966 318 | 
0-25 0-61708 696 0-75 0-38648 315 | 
0-26 0-61012 679 0-76 0-38333 311 
0-27 0-60333 663 0:77 0-38022 308 
0-28 0-59670 648 0-78 0-37714 304 
0:29 0-59022 634 0-79 0-37410 301 | 
0-30 0-58388 620 0-80 0-37109 297 d 
0-31 0-577068 607 0-81 0-36812 E 
0-32 0-57161 595 0-82 0-36518 
0-33 0-56566 583 0-83 0-36227 287 
0-34 0-55983 572 0-84 0-35940 285 
0-35 0-55411 560 0-85 0-35655 281 
0-36 0-54851 551 0-86 0:35374 278 \ 
0-37 0-54300 540 0-87 0-35096 276 
0-38 053760 530 0-88 0-34820 272 
0-39 0-53230 521 0-89 0-34548 270 
0-40 0-52709 512 0-90 0-34278 267 
0-41 0:52197 503 | 0-91 0-34011 264 
d 0-51694 495 0-92 0-33747 261 
0-43 0-51199 487 0-93 0-33486 258 
044 0-50712 479 0-94 0-33228 256 
0.45 0-50233 471 0-95 0-32972 258 
29 025765 463 0-96 0-32719 251 
0: 0.49299 457 0-97 0-32468 248 
0-47 0.48842 449 0-98 0-32220 246 
d 0-483293 443 (n з 243 n 
0-50 0-47950 436 1-00 0-31731 241 ГА 
ed ] 


APPENDIX TABLES 445 
APPENDIX TABLE 7 


Distribution Function of y? for One Degree of Freedom for Values of 4? from 1 to 10 by 


Steps of 0-1. 
= Si 
P P A x | Р 4 
10 0-31731 2304 5-5 0-01902 106 
11 0-29427 2095 5-6 0-01796 99 
1-2 027332 1911 5-7 0-01697 94 
1:3 0-25421 1749 58 0-01603 89 
1-4 0-23672 1605 5-9 0-01514 83 
15 0-22007 1477 6-0 0-01431 79 
1-6 0-20590 1361 6-1 0:01352 74 
17 0-19229 1258 6-2 0-01278 71 
1:8 0-17971 1163 6-3 0-01207 66 
1-9 0-16808 1078 6-4 0-01141 62 
2.0 0-15730 1000 6-5 0-01079 59 
24 0-14730 929 6-6 0-01020 56 
2.2 0-13801 864 6-7 0-00964 52 
9-8 0-12937 803 6-8 0-00912 50 
24 012134 749 6-9 0-00862 47 
2.5 0-11385 699 7-0 0-00815 44 
2-6 0-10686 651 Tl 0-00771 42 
2-7 0-10035 609 7.2 0-00729 39 
2-8 0-09426 568 13 0-00690 38 
2.9 0-08858 532 7-4 0-00652 35 
3-0 0-08326 497 7-5 0-00617 33 
3-1 0-07829 465 7-6 0-00584 32 
3-2 0-07364 436 7:7 0-00552 30 
3.3 0-06928 408 TS 0-00522 28 
| 3-4 0-06520 383 7-9 0-00494 26 
\ 3-5 0-06137 359 8-0 0-00468 25 
| 3-6 005778 337 8-1 0-00443 24 
| 3-7 0-05441 316 8-2 0-00419 23 
3-8 0-05125 296 8-3 0-00396 91 
3-9 0:04829 279 8-4 0-00375 20 
4-0 0-04550 262 8-5 0-00355 19 
4-1 0-04288 246 8-6 0-00336 18 
4-2 0-04042 231 8-7 0-00318 17 
4-3 0-03811 217 8-8 0-00301 16 
44 0-03594 205 8-9 0-00285 15 
4-5 0-03389 192 9-0 0-00270 14 
4-6 0-03197 181 9-1 0-00256 14 
47 0-03016 170 9.2 0.00242 15 
48 0-02846 160 9:3 0-00229 12 
4-9 0-02686 151 9-4 0-00217 19 
5-0 0-02535 142 9.5 0-00205 10 
| 5-1 0-02393 134 9.6 0-00195 1 
1 5:2 0.02259 126 9-7 0.00184 ; 
| 5-3 0-02133 119 9.8 10 
| 54 0-02014 112 DOLI 9 
| 9-9 0-001 
5:5 0-01902 1 65 8 
06 10-0 0-00157 8 


"GX ч) = 


d 999jmg oy} jo sowy moquo)—'mvusbyiq xiaxuaay 


zX yo әп 
Sg 4 52 


© 
S 


T000:0 


ту 
лшорәә1] Jo 5әәлбә(т 


2 


0 


єз 


100 | 


“0 60 


0€ 
68:0 666:0 


INDEX 


(References are to pages.) 


Abortion, distribution of women according to term 
of, (Table 1.23), 26. 

Abrupt distributions, corrections for grouping to, 
79; refs., 85-6. 

Absolute moments, 56; Liapounoff's inequality 
for, (Exercise 3.14), 88. 

Accidents, exemplified by Poisson distribution, 
124. 

Adyanthaya, N. K., refs., distribution of frequency 
constants in small samples, (under Pearson), 
28. 

Ago, acd with highest audible pitch, 
# (Table 14.1), 325; (Example 14.1), 331. 
Aereement, coefficient of, 427-9; significance of, 
~ . 499-35. l ; 
Agricultural Research Institute, Oxford, data 

from Report of, (Table 1.9), 9. 
Alcoholism and crime, Goring's data on, 
14.6), 356. 
and health, 
Ammon, O., data from (| 


Table 12.4), 300. e Є: 
Woods Committee of British Association, 


data from Report of, (Table 1.7) 8; 
(Table 1.10), ү E 
i ition of, 35. 
i endum to sampling distributions, sce 
Sampling distributions. - . 
Arithmetic mean, $06 Mean, arithmetic. р 
Aroian, L. А., fitting of Type B distribution to 
data, (Example 6.4), 156; refs., Type B 


series, 160. С 
іп bivariate distributions, 327. 


(Table 


(Exercise 14.12), 366. 
hair and eye-colour), 


5, 7 
быны, generally, 308-23; coefficients of, 
ч 310-13; partial, 313-17 ; illusory, 317-18. 


cal distributions, 10; see Skewness. 
in Poisson dis- 
in finite 


Asymmetri 

о рса, sampling of, 197-201 ; 
tribution, (Exercise 8.2), 203; 
populations, (Exercise 8.3), 203. zB 

Australian marriages, distribution of, (Ta P 3» 
9; moments of, (Example 3.1), 50-2 ; 
В, and f, of, (Example 3.16), 82. 

Average, see Mean. 

Corrections to moments, 

Sheppard's corrections. 


74-5; see also 


6, (sampling value of бү, measure of skewness), 
2779-80. 

Babington Smith, B., data rom Катани nami: 
bers, (Table 8:3). 9189. Random Sampling 
Numbers, 193, 197; refs. (under Kendall), 


202 ; distribution of Spearman's p, problem 
of m rankings, method of paired com- 
parisons, 436. 

Baker, G. A., distribution of means in Type A 
series, (Exercise 10.7), 252. 

Bayes, T., refs., doctrine of chances, 183. 

Bayes’ theorem, 175-7; postulate, 176-8; com- 
parison with maximum likelihood, 178-80 ; 
in estimating proportion of attributes, 200 ; 
(Exercise 8.5), 203. 

Beans, distribution of, (Table 1.15), 20; histogram 
of, (Figure 1.4), 20; fitting of Pearson dis- 
tribution to, (Example 6.1), 143-4; Gram- 
Charlier series fitted to, (Example 6.2), 151. 

Bernouilli polynomials, definition of, 58. 

numbers, footnote, 69; 71, 78. 

Bernstein, S., refs., extension of central limit 
theorem, 183. 

В, В (skewness and kurtosis), 81 ; standard errors 
of, 225; sampling distributions of, 279-80, 
(Exercise 11.17), 289; generalised f, 82. 

B-function, in summing binomial, 120. 

Bias in sampling, 187-90; in choosing plants, 
(Example 8.1), 187-8; in scale reading, 
(Example 8.2), 188 ; in reading randomising 
machine, (Example 8.3), 189; in crop- 
reporting, (Example 8.4), 189-90. 

Binomial distribution, general properties, 116-20 ; 
moments of, 117, (Example 3.2), 52; dis- 
tribution function of, 119-20; у, and уз 
of, (Example 3.17), 82; factorial moments 
of, (Exercise 3.6), 87; limiting form, 
(Example 4.6), 103; arising from mixed 
population, 122-4; with negative index, 
125-6 and (Exercise 5.7), 136; bivariate 
form, 133-4; cumulants of, (Exercise 5.1), 
135; incomplete moments of, (Exercises 
5.2 and 5.3), 135 ; in sampling of attributes, 
198; distribution of means of, (Example 
10.8), 243. 

Birth-rates, distribution of in Local Government 
Areas, (Table 1.1), 3; frequency polygon 
of, (Table 1.1), 4. ү 

Biserial y, 356-8. 

Bivariate binomial distribution, 133-4. 

frequency-distributions, 19-22. 

moments and cumulants, 79-81; standard 

errors of, 211; k-statisties and eumulants, 
281-3. d 
normal distribution, 22; (Example 3.15), 
79-80; moments of, (Exercise 3.15), 89; 


447 


448. 


as limit of bivariate binomial, 133-1; 
correlation and regression of, 334-6 ; multi- 
variate form, 376-7. 

Bivariate Poisson distribution, (Exercise 5.8), 136. 

Borel, E., refs., Traité du Calcul des Probabilités, 
22, 183. 

Bortkiewiez, L. von, data from on suicide, (Table 
1.6), 7. 

Bose, S. S., distribution of variance ratio, (Exer- 
cise 14.8), 365. 

Bowley, A. L., refs., F. Y. Edgeworth’s Contribu- 
tions to Mathematical Statistics, 160 $ Tepre- 
sentative method, 202. 

Brood-mares, distribution of fecundity in, (Table 
1.20), 24. 


Call discount rate, distribution of weekly returns 
according to, (Table 1.26), 28. 

Camp, B. H., refs., distribution functions of bi. 
nomial and hypergeometric, 134. 

Card-shuffling, tested by 7?, (Example 12.1), 297-9 ; 
tested by rankings, (Example 16.5), 420. 

Carleman, T., criteria for uniqueness in the prob- 
lem of moments, 109; refs., Les fonctions 
quasi-analytiques, 114. 

Carver, H. C., Sheppard corrections for discrete 
variables, 85. 

Cauchy distribution, (Example 3.12), 67-8; char- 
acteristic function of, (Example 4.2), 95-6 ; 
distribution of mean of samples from, 
(Example 10.1), 233-4 and (Example 10.15), 
247. 

Cave, B. M., refs., sampling of correlation co- 
efficient (under Co-operative Study), 363. 

Census of Population, data from Housing Report, 

(Table 1.24), 27. 
of Production, data from, 
(Table 1.17), 23. 

Central Limit Theorem, 180-3. 

Characteristic functions, as moment-generating 
functions, 54; genéral theory of, 90-115; 
limiting properties of, 99-104; multi- 
variate, 104-5; conditions for a function 
to be, 98-9; in sampling distributions, 
242-6, 

Charlier, C. V. L., Types A and B series, 147 (see 
Gram-Charlier series) ; refs., expansion of 
frequency functions, 160. 

Cheshire, L., refs., significance of correlation со- 
efficient, (under Е. 8. Pearson), 363. 

Chi-square, see у, 

x?-distribution ; generally, 290-307 ; properties 
of, 292-7; in2 x 2 tables, 303 ; correction 
for continuity, 303-4; as square con- 
tingency, 319. 

Cholera, inoculation against, (Table 12.6), 302; 

(Example 13.1), 309; (Example 13.2), 
311-12; (Example 13.3), 313. 


on size of firms, 


INDEX 


Church, A. E. R., sampling moments, 
284-5. 

Circular triads, in preferences, 423. 

Class-frequency, definition, 2. 

Class-interval, definition, 9 ; ambiguities in, 5. 

Cloudiness, distribution of days according to, 
(Table 1.11), 10. 

Cochran, W. G., refs., z?-distribution, 305. 

“ Cocked-hat " as Synonym for unimodal, 29. 

Coefficients of association, contingency, correlation, 
ete., see under Association, Contingency, 
Correlation, ete. 

Coin-tossing, as example of sampling of attributes, 
(Example 8.9), 198. 

Colligation, coefficient of, 311. 

Combinatorial method, in sampling of k-statisties, 
sce k-statistics. 

Comparisons, paired, see Paired comparisons. 

Comrie, L. J., refs., Tables of arc tan x and log 
(1 + 2%), 160. 

Concentration, Coefficient of, 43; 
and (Figure 2.3), 44, 
Concordance, coefficient of, 41] ; 

rankings, 
Consistence, coefficient of, 425. 
Contingency, 318-22 ; Coefficient of, 319. 
Continuity correction to z?, 303-5. 
Continuous frequency functions, 13; 
from, 197. 
Continuum, probability in, 170. 
Co-operative Study on correlation coefficient, refs., 
363. 
Cornish, E. A., refs., moments and cumulants in 
Specification of distributions, 160. 
Corrections for grouping in calculation of moments, 
30, 41; when distribution is abrupt, 79, 
refs., 85-6 ; see also Sheppard's corrections. 
Correlation, coefficient of product-moment, gener- 
ally, 324-67; definition, 329; calculation 
of, 330-4 ; in bivariate normal distribution, 
334; sampling of, 336-48 ; standard error 
of, (Example 9.6), 211; Fisher’s trans- 
formation of, 345; tables of (David), 345. 
coeffieient of multiple correlation, 380-1; 
mpling of, 381-5. "i 
сова іде of partial, 368-79; definition, 
370 ; in terms of coefficients of lower orders, 
372; geometrical interpretation, 372-3; 
examples of, (weather and crops, Example 
15.1), 373-5, (crime and religion, Example 
15.2), 375-6 ; in multivariate normal dis- 
tribution, 376-8 ; sampling distribution of, 
378-9. r 
—— intra-class, 358-62. 
rank, generally, 388-421; Spearman's. co- 
efficient, 388-91 ; sampling of, 394—403 ; 
coefficient т, 391-3; sampling of, 403-8, 
See also m rankings. - 


256; refs., 


curve of, 43-4 


see also m 


sampling 


INDEX 449 


Correlation ratios, definition, 351; sampling dis- 
tribution in uncorrelated normal population, 
352-3; relation with multiple correlation, 
381; for ranked data (Wallis), 437. 

Covariance, definition, 79; notation for, 204; 
calculation of, 330; distribution -of in 
normal samples, 339-42. 

Cows, distribution of according to age and milk- 
yield, (Table 1.25), 27. > 
Craig, C. C. corrections to moments of discrete 
distribution, 77, refs., 80; (Exercise 3.13), 
88; sampling of cumulants, 256, refs., 

285. ы | 

Cramér, H., convergence of Gram-Charlier series, 
151-2, 159, refs., 160 ; central limit theorem, 
181-3; distribution of a ratio (Exercise 
10.8), 252; refs., Random Variables and 
Probability Distributions, 114, 183, 250. 

Crime and alcoholism, Goring’s data on, (Table 
14.0), 356. ж 

_—— correlation with religion, 
375. 


Crop-reporters, i 
Crops, correlation with wi 


73. М 
ee eggs, distribution of length of, (Exercise 

ED MM 60; invariantive properties 
Camu о relations with moments, 61-4; 
existence of, 64-5; calculation of, 65-8 ; 
in pivariate case, 80; Sheppard's correc- 
s to, 78 and (multivariate case) 80-1; 


(Example 15.2), 


ias in, (Example 8.4), 189-90. 
bel eather, (Example 15.1), 


ion: 5 
gonerating functions for, 90; of normal 
distribution, E 
ive function, 90. i 
Cumulativ entration, see Concentration. 


Curve of сопс 


i distribution according to costs of 
pay production, (Table 1.9), 9. 

David, F. N., distribution of difference of Type 

III variates, (Exercise 10.6), 252; Tables 

of the Correlation Coefficient, 345, refs., 


363. : . 
Davies, О. L., refs., estimation of standard devia- 


tion, 228. 
Deaf-mutes, distributio 
1.19), 24. ; 
Deaths, from scarlet fever, (Table 1.3), 5; dis- 
tribution of, according to age at death, 
(Table 1.12), 11. 
Deciles, definition, 36; interdecile range, 38; 
standard errors of, 225. 


n of children of, (Table 


de Finetti, B., refs., calculation of mean difference, 


aT: 


Degrees of freedom, in y2-distribution, 292; in 
contingency table, 299. 


e la Vallée P. 2 
А Н С. J., rofs., Cours d'analyse, 


A.S.—VOL. I. 


Demoivre, A., discoverer of normal distribution, 
131. 
Denjoy, A., theorem on uniqueness of quasi- 
analytic functions, (Exercise 4.10), 115. 
Dice, throws with, Weldon’s data (Table 1.14), 19 ; 
- (Table 1.16), 23; (Example 8.10) 199; 
(Table 12.5), 301. 
Digits, distribution of, from telephone directory, 
(Table 1.4), 6. i: 
Direct probability, see Probability. 
Dirichlet integrals, in Inversion Theorem, 91-2. 
Discontinuous frequency-functions, 12. 
variate, examples of distribution according 
ы to, 6-7. 
Dispersion, measures of, 38-48 ; see also Standard 
Deviation. 
Distribution curve, 36-7. 
functions, 12-15 ; determined by character- 
istic function, 91-4 ; limiting properties of, 
99-104, 110-13; determination by mo- 
ments, 105-10; standard distributions, 
116-63, see also Standard Distributions ; 
relation with probability, 172-3. 
Doodson, A. T., relation of mean, median and 
mode, 35, 46; refs., 47. 
Dórge, K., refs, axiomatisation of von Mises’ 
theory of probability, 183. 
Dressel, P. L., refs., seminvariants, 84—5, 285. 


Edgeworth, F. Y., citing Weldon’s dice data, 
(Table 1.14), 19; form of Gram-Charlier 
series, 148-9; refs., law of error, 160. 

Edwards, J., refs., Integral Calculus, footnote, 68 
and footnote, 221. v 

Eells, W. C., formulae for probable errors of 
correlation coefficients, 410, refs., 436. 

Egyptian skulls, distribution of, (Table 1.22), 
25. 

Elderton, E. M., data on health of son and alcohol- 
ism of father, (Exercise 14.12), 366. 
Elderton, Sir William P., Hardy’s method of calcu- 
lating factorial moments, 59; corrections 
for moments when the distribution is sym- 
metrical, 85 and (Exercise 3.10), 87-8; 
fitting of Pearson distributions, 143; on 
Gram-Charlier series, 153 ; tables of y?, 293 ; 
refs., Frequency Curves and Correlation, 85, 

160. 

Error, standard, see Standard Error. 

Estimates, of proportions of attributes, 199-200 ; 
in large samples generally, 201-3; of a 
ranking, 421. 

Euler-Maclaurin sum formula, 69. 

Expectation, 84. See also Mean Values. 

Extreme values of sample, distribution of, 217-22. 

Eye-colour, relation with hair-colour, (Example 
12.3), 299; in parent and child, (Example 
13.4), 314. 


GG 


450 INDEX 


Factorial moments, 56-60; definition, 56; in 
terms of ordinary moments, 57-8; calcu- 
lation of, 58-60; Sheppard's corrections 
to, 77-8 ; generating function for, 90; of 
binomial, 118; of hypergeometrie, (Exer. 
cise 5.4), 135. 

Families deficient in room Space, distribution of, 
(Table 1.24), 27. 

Fathers, height of, distribution of Sons according 
to, (Table 14.3), 327. 

Fay, E. A., data from Marriages of the Deaf in 
America, (Table 1.19), 24. 

Fecundity, distribution of brood-mares according 
to, (Table 1.20), 24. 

Feeding and teeth in infants, (Example 12.5), 304. 

Fieller, E. C., refs. distribution of a ratio, 250. 

Filon, L. N. G., refs. (under Pearson), standard 
errors of frequency constants, 229. 

Finite populations, sampling from, 283-4, 

Finney, D. J., sampling of variance ratio, (Exercise 
14.8), 365. 

Firms in the Food, Drink and Tobacco Trades, 
distribution of, (Table 1.17), 23. 

First hands at whist, distribution of, (Table 5.4), 
128; (Example 12.1), 299-300, 

First Limit Theorem, 100-1 3 converse of, 101-3, 

Fisher, Arne, modified form of Gram-Charlier 
Series, 153; fitting of Type B to data 


(Example 6.4), 156; refs., Frequency 
Curves, 160. 
Fisher, R. A. Sheppard’s corrections, 75-7; 


introduction of word “ cumulant,” 85; 
random sampling numbers, 194, 197; dis- 
tribution of mean deviation, 215; dis- 
tribution of extreme, 220 ; z-distribution, 
see z-distribution; k-statistics, 256, 268; 
measures of departure from normality, 
(Exercise 11.16), 289; tables of x43, 293; 
normal approximation to %?, 294-5; dis- 
tribution of 7? when parameter estimated 
from data, 301; distribution of variances 
and covariance in normal samples, 340; 
transformation of correlation coefficient, 
345; distribution of multiple correlation 
coefficient (Exercises 15.6 and 15.7), 387; 
refs., moments and cumulants in specifica- 
tion of distributions, 160; mathematics of 
statistics, 184; inverse probability, 184 ; 
distribution of mean deviation, 228; dis- 
tribution of extreme values, 228; distribu- 
tion of correlation coefficient, 250, 363 ; 
distribution of well-known statistics, 250 ; 
applications of Student’s distribution, 250 E 
k-statisties, 285; distribution of Х% 305; 
distribution of partial coefficients, 386; 
distribution of multiple correlation, 386. 

Food, Drink and Tobaeco Trades, distribution of 
firms, (Table 1.17), 23. 


Footrule, Spearman's, 436. 
Franel, J., theorem on distribution of digits in 
mathematical tables, footnote, 193. 
Fréchet, M., proof of Second Limit Theorem, 112, 

113, refs., 114. 
Frequency (class-frequency), definition of, 2, 
Frequency distributions, generally, 1-28 ; Eenesis 
of, 18. 
Frequency-functions, 12-15; discontinuous, 12; 
determined by characteristic function, 91-4 : 
normalisation of, 156—9. j 
Frequency-polygons, 1; bivariate form, 20, 
Friedman, M., tests of significance in m rankings 
420, refs., 436. " \ 
Frisch, R., moments of binomial, 58 and (Exercise 
5.3) 135; refs., moments and cumulants, 
85, 134; correlation analysis, 386 


Galbrun, H., convergence of Gram-Charlier Series, 
152. 

Galton, Sir Francis, data from Natural I. nheritance, 
(Example 13.4), 314; correlation, 363. 

Galton’s ogive, see Distribution curve, 

Galvani, L., refs. (under Gini), median for qualita- 
tive characteristics, 47. 

71 Ya (skewness and kurtosis), 82, 

T-function, in summing Poisson series, 122, 

Garwood, F., data from, (Table 12.1), 297-8; 
refs., fiducial limits for Poisson distribution, 
305. 

Geary, R. C., distribution of measures of departure 
from normality, (Exercise 11.16), 289; 
distribution of a ratio, (Exercise 10.9), 253 
and refs., 250. 

Geiger, H., sce Rutherford. 

Generating functions, for moments and cumulants, 
90. See also Characteristic functions. 

Geometric mean, see Mean, geometric. 

German women, distribution of suicides, (Table 
1.6), 7. 

Gilby, W. H., data from, on intelligence. and 
clothing in schoolchildren, (Table 13.1), 320. 

Gini, C., coefficient of concentration, 43; co- 
efficient of mean difference, 42; standard 
error, 216, 225; refs., mean difference, 47 ў 
median for qualitative characteristics, 47. 

Glossina morsitans (tsetse fly), distribution of 
trypanosomes in, (Table 1.18), 19; 

Goring, C., data on alcoholism and crime, (Table 
14.6), 356. 

Gosset, W. S., see “ Student.” 

Grades, 408; relation with ranks, 408-10. 

Graduation curve, see Distribution curve. 

Grain, distribution of plots according to yield of, 
(Table 1.18), 23. 

Gram-Charlier series, Type A, 147-54; Edge- 
worth’s form, 148-50; fitting | to bean 
data, (Example 6.2), 151; distribution of 


INDEX 


means from, (Exercise 10.7), 252; Type B, 
154-6; Type C, 160. 

Greenwood, M., data on industrial accidents, 
(Table 5.3), 124; on inoculation against 
cholera, (Example 13.1), 309. 

Grouping of frequency-distributions, corrections 
for, see Sheppard’s corrections. 

Gumbel, E. J., distribution of mth values, 220-2 
and refs., 228, ` 


Haines, J., refs. (under Pearson), use of range, 228. 

Hair-colour, relation with eye-colour, (Example 
12.3), 299-300. 

Haldane, J. B. S., refs., cumulants and moments 
of binomial, 134 and (Exercise 5.1), 135; 
4? with small expectations, 305 ; normalisa- 
tion of frequency functions, 305 and (Exer- 
cise 12.1), 306. 

Half-invariants (seminvariants), see Cumulants. 

Hall, Sir A. D., data on yield of grain, (Table 1.18), 
23. 

Hall, P., refs. distribution of mean from rectangular 
population, 250 ; multiple correlation, 386. 

Hamburger, H., problem of moments, 107 and 


refs., 114. 
Hardy, Sir G. F., calculation of factorial moments, 
59. 


Harmonie mean, see Mean, Harmonic. 

Hartley, H. O., distribution of range, 224 and 
refs., 228. Е 

Health of son апа aleoholism of parent, (Exercise 
14.12), 366. 

Height, distribution of men according to, (Table 
1.7), 8; frequency-polygon of, (Figure 1.3), 
8; mean, (Example 2.1), 30-1; median, 
(Example 2.4), 35; quartiles, (Example 
2.5), 36; distribution curve, (Figure 2.2), 
37; mean deviation and standard devia- 
tion, (Example 2.6), 39-40; mean differ- 
ence, (Example 2.8), 45-6; factorial and 
ordinary moments (Example 3.7), 59-60 ; 
fitted to normal curve, (Table 5.5), 132; 
standard error of mean, (Example 9.1), 207. 

———, in fathers and sons, (Table 14.3), 327; 
correlation, (Example 14.5), 337. 

—, distribution of plants according to, (Table 
8.1), 187. 

Hellman, M., data on teeth and feeding in infants, 
(Example 12.5), 304. 

Helly, W., theorems on convergent sequences of 
functions, 100, 112. 

Helmert, W., distribution of mean deviation, 215 ; 
distribution of sums of squares, 250, 305. 

Henderson, J., refs., expansion in tetrachoric 

К functions, 160. 

Hermite, C., polynomials, 145, 160. See Tcheby- 
cheff-Hermite polynomials. 

A.S.— VOL. I. 


451 


Heron, D., refs. (under Pearson), coefficients of 
association, 322. 

Heterotypie frequency-distributions, 145. 

Highest audible pitch and age, bivariate distribu- 
tion according to, (Table 14.1), 325 ; corre- 
lations and regressions, (Example 14.1), 
331; correlation ratios (Example 14.11), 
351-2. 

Hilferty, M. M., limiting distribution of y?, 294-6 
and refs. (under Wilson), 305. 

Hilton, J., refs., inquiry by sample, 202. 

Histogram, 4; bivariate, 20. 

Hojo, T., refs., distribution of median, quartiles 
and semi-interquartile range, 228. 

Homoscedastie distributions, 335. 

Hooker, R. H., data on weather and crops, 
(Example 15.1), 373 and refs., 386. 
Hotelling, H., distribution of Spearman's p, 401 

and refs., 436. 
Hsu, C. T., sampling cumulants of normal dis- 
tribution, 275 and refs., 285. 
Hypergeometrie distribution, generally, 126-8; 
moments of, 127 ; example of, (Table 5.4), 
128; factorial moments of, (Exercise 5.4), 
135; limiting forms of, 132-3. 
function, 127. 
Hypothetical population, 187. 


Illusory association, 317. 

Income, distribution of persons by, (Table 1.2), 3; 
histogram of, (Figure 1.2), 4; distribution 
curve of, (Figure 2.1), 37. 

Incomplete moments, 43; of binomial, refs., 134 
and (Exercises 5.2, 5.3), 135. 

Independence, definition, 21 ; in association tables, 
309; in bivariate frequency tables, 320. 

Index, distribution of, see Ratio. 

Induetion, in finding sampling distributions, 
246-8. 

Inequalities for moments, 56 ; refs. (Shohat), 86 ; 
Liapounoff's, 56 and (Exercise 3.14), 88. 

Inoculation against cholera, see Cholera. 

against tuberculosis in cattle, (Exercise 12.7), 

307. 

Intelligence, distribution of schoolchildren aecord- 
ing to, (Example 13.6), 320. 

Interdecile range, 38. 

Interquartile range, 38; standard error of semi- 
interquartile range, (Example 9.8), 214. 

Interval (class-interval), see under Class. 

Intra-class correlation, 358-62. 

Inverse probability, 176 ; see Bayes’ theorem. 

Inversion theorem, 91-8; examples of use of, 
94-8. 

Trregular Kollektiv of von Mises, 171-2. 

Irwin, J. O., distribution of means, (Exercises 
10.3 and 10.4), 251 and refs., 251. 


ea* 


452 


J-shaped distribution, 10. 

Jackson, Dunham, on indetermina 
46 and refs., 47. 

Jeffreys, H., logic of probability, 165 ; 
Theory of Probability, 184. 

Jensen, A., refs., representative method in 
statistics, 202. И 

Johannsen, W., bean data cite 
(Table 1.15), 20. 

Johnson, W. E., logic of probability, 165 ; refs., 
Logic, 184. 

Jordan, C., refs., Statistique mathématique, 160; 

` Type B series (Exercises 6.4 and 6.5), 161-2. 

Jørgensen, N. R., tables of Tchebycheff-Hermite 
polynomials, 147, 151 3_refs., Undersogelser 
over Frequensflader og Korrelation, 160. 


cy of median, 


refs., 


d by Pretorius, 


k-statisties, definition 256; general properties, 
256-60; sampling cumulants of, 260-89; 
in multivariate case, 281-3, 

к, as criterion of type in Pearson distributions, 140, 

Kelley, T. L., tables of 25, 293; tables of correla- 
tion coefficient, 375 and refs., 386; refs., 
Kelley Statistical Tables, 386. 

Kendall, M. G., data from, (Table 1.4), 
pard corrections, 75; multivariate cumu- 
lants, 80; randomness, 172; maximum 
likelihood, 179 ; data from, (Table 8.3), 189 H 
Random Sampling Numbers, 193, 197; 
refs., Sheppard corrections, 85; multi- 
variate sampling formulae, 85; random- 
ness, 184; maximum likelihood, 184; 
randomness and random sampling numbers, 
202; k-statistics, 285; rank correlation 
and paired comparisons, 436, 

Kendall, S. F. H., refs., distribution of Spearman’s 
, 436. 

Koi J. M., (now Lord Keynes), on probability, 

3 165; refs., T'reatise on Probability, 184. 

Kiser, C. V., refs., pitfalls in sampling, 202. 

Koga, Y., data from, (Table 14.1), 325. 

Kollektiv of von Mises, 171-2. 

Kolmogoroff, A., probability as abstract ensembles, 
165; refs., Grundbegriffe der Wahrschein- 
lichkeitstheorie, 184. 

Kondo, T., refs., standard error of mean square 
contingency, 321, 322. 

Kullback, $., refs., distributions and characteristic 
funetions, 251, 363; and (Exercises 10.2 
апа 10.5) 251, 252, (Exercise 14.7), 364-5. 

Kurtosis, 82. 


6; Shep- 


Lagrange, J. L., distribution of mean from rect- 
angular population, 950. 

Laplace, P. 8. (Marquis de), characteristic func- 
tions, 113; continued fraction for the 
normal distribution, 129-30; succession 


Lawley 


INDEX 


rule, (Example. 7.7), 177; early work on 
Central Limit "Theorem, 180. 

Large samples, approximations in theory of, 201-2. 
See Standard Error. 

Laterality of hand and eye, (Exercise 13.5), 323. 

Latter, O. H., data on length of cuckoo's eggs, 

(Exercise 14.13), 366. ^ 

+D. N., sampling cumulants of k-statistics, 
275 and refs. (under Hsu), 285. 

Least squares, in determination of regression lines, 
328-9, 368. 

Lee, A., data from, on fecundity of mares, (Table 
1.20), 24; on stature of fathers and sons, 
(Table 14.3), 327 ; refs., sampling of correla- 
tion coefficient, (under Co-operative Study) 
363. 

Leibniz, G. W., logic 

Leptokurtosis, 82, 

Lévy, P., refs., Calcul des Probabilités, 22, 184; 
characteristic functions, 113, 114. 

Liapounoff, A., inequality for moments, 56 and 
(Exercise 3.14), 88 ; proof of Central Limit 
Theorem, 180, 183 ; refs., limit theorems in 
probability, 184. 

Likelihood, 176 ; principle of maximum likelihood, 
178-80; in estimating proportion of attri; 
butes, 199-200; relation with Bayes 
theorem, 178-80 and (Exercise 8.5), 203. 

Limit theorems for distributions, see First. Limit 
Theorem, Second Limit Theorem. 

Lindeberg, J. W., coridition for validity of Central 
Limit Theorem, 181. 

Linear regression, 327-9, 368-76. 

Location, measures of, 29-38. бее Mean, ete. 

Lottery sampling, 192. 


of probabilities, 165. 


m rankings, 410-21. 
Macaulay’s essays, distribution of sentence length 
in, (Table 1.21), 25. è 1 AM 
Male births, distribution of registration districts 
according to, (Table 14.2), 326; constants 
of, (Example 14.2), 364. 
Malocclusion of teeth in infants, (Example 12.5), 
304. ч 
1 imit Theorem, 113. 
Iarkoff, A., refs., Second Limi r " 
Mirum азаб of Australian, see Aus. 
tralian; of deaf in America, (Table 1.19), 
T f: tions to moments 85, 
іп, Е. 8., refs., correcti to п e А ; 
Е likelihood, 178. See Likelihood. f 
Nisan arithmetic, definition of, 29; properties of, 
" 32; relation with median and mode, 35, 
46 : “as first moment, 39 5 standard errop 
E рэ ; distribution of, in normal samples 
(Example 10.5), 238-9; in rectangula? 
samples (Examples 10.7 and 10.12), 240 
244; in Poisson distribution, liac 
10.9) 243; in binomial, (Example loy 
» 


Н 
} 
| 


INDEX 453 


243; in Type III distribution (Example 
10.11), 244. 

Mean deviation, about mean, 38; about median, 
38; standard error of, 215. 

— difference, 42; calculation of, 45 and (Exer- 
cise 2.10) 48 ; standard error, 216-17, 225. 

——, geometric, 32; less than arithmetic mean, 
33-4; distribution of, 245-6; from rect- 
angular population, (Example 10.13), 245 ; 


from Type III distribution (Exercise 10.2), 


251. 
——, harmonic, 32; less than arithmetic and 
geometric means, 33-4. 

square contingency, 319. 

values, 84; in sampling problems, 254-6. 

Measures of location, dispersion and skewness, 
29-48, 81-2. А 

Median, 34; relation with mean and mode, 35, 
46; standard error of (Example 9.7), 213, 
225,0 ' / 

Mehler, G., refs., expansion in tetrachoric series, 
363. 

Mendelian law, test of, as sampling of attributes, 
197-8; in pea breeding, (Example 12.2), 
299. 

Mercer, W., data from, (Table 1.18), 23. 

Merzrath, E., refs., bivariate frequency-distribu- 
tions and correlation, 85. 

Mesokurtosis, 82; in normal distribution, 129. 

Milk, costs of production of, (Table 1.9), 9. 

Milk-yield, distribution of cows according to, 
(Table 1.25), 27 ; covariance and variances, 
(Exercise 14.1), 364. 

Milne-Thomson, L. M., Calculus of Finite Differ- 
ences, footnote, 69. 

Miner, J. R., tables of correlation coefficients, 375 
and refs., 386. 

Mises, R. von, probability as limit in sequences, 
165, 171-2; refs., Wahrscheinlichkeit, Sta- 
tistik und Wahrheit, 184. 

Mode, 35; relation with median and mean, 35, 
46 ; standard error in Pearson distributions, 
225. 

Moments, preliminary, 39 ; Sheppard’s corrections 
to, 41; definition, 49; about one point in 
terms of those about another, 49; caleula- 
tion of, 50-4; generating functions for, 
54-6, 90; absolute moments, see Absolute ; 
factorial moments, see Factorial; in terms 
of factorial moments, 57-8; relationship 
with cumulants, 61-4; -corrections for 
grouping, 68-78; multivariate, 79-80; 
corrections to multivariate, 80-1; as 
characteristics of a distribution, 83-4; 
problem of moments, 105-10; of binomial, 
117, 118; of hypergeometric, 127; of 
normal distribution, 129; standard errors 
of, 204-11, 225; distribution of, 245. See 


also Sheppard’s corrections, Second Limit 
Theorem, Cumulants. 

Montel, P., theorem on convergent sequences of 
functions, 100. 

Moore, G., data from, (Table 1.20), 24. 

Morant, G., refs., random occurrences in space and 

РЄ. time, 134; data from (Table 14.1), 325. 

mth values, distribution of, 217-22. 

Multiple correlation, see Correlation. 

Multivariate: distributions, 19-22; normal dis- 
tribution, 376-7; sampling distributions, 
250; correlation, see Correlation; mo- 
ments and cumulants, 79-81; character- 
istic functions, 104-5; k-statistics, 281-3. 


Nair, U. S., distribution of mean difference, 216, 
225 and. refs.,. 228. T 

Neyman, J., on theory of estimation, footnote, 
180; refs. estimation, 184; representa- 
tive method, 202; sampling from finite 
population, 284, 285. 

Nicholson, C., refs., distribution of a ratio, 251. 

Normal distribution, generally, 128-32 ; moments 
of, (Example 3.4) 53-4; cumulants of, 
(Example 3.10), 67; providing standard of 
kurtosis, $2; characteristic function of, 
(Example 4.1), 94; as limit of binomial, 
(Example 4.6), 103; determined uniquely 
by its moments, (Example 4.7), 109-10; 
as limit of Poisson distribution, (Example 
4.8), 113 ; distribution function of, 129-80 ; 
as one of Pearson's types, 141; in Central 
Limit Theorem, 180-3; in sampling of 
attributes, 198-9 ; distribution of mean in 
samples from, (Example 10.2), 234-6, 
(Example 10.3), 236-7, (Example 10.10), 
243; distribution of variance in samples 
from, (Example 10.5), 238-9 ; sampling of 
k-statisties from, 274; distribution of 
measures of departure from, (Exercise 
11.16), 288; bivariate form, sce Bivariate ; 
multivariate form, 376-7. 

Normalisation of frequency-functions, 156-9. 


- Norris, N., refs., inequalities among averages, 47. 


Norton, J. P., data from Statistical Studies in the 
New York Money Market, (Table 1.26), 28. 


Ogburn, W. F., correlation of crime and religion, 
(Example 15.2), 375 and refs., 386. 

Ogive of Galton, see Distribution curve. -< 

Oldis, E., refs., significance of correlation co- 
efficient, (under E. §. Pearson), 363. 


Pabst, M. R., distribution of Spearman’s p, 401, 
and refs. (under Hotelling), 436. 


Paciello, U., refs., calculation of mean difference, 
47. 


454 


Paired comparisons, 421-36. 

Pairman, E., refs., corrections to abrupt distri- 
butions, 85. ч 

Parameters, definition, 29; of location, 29-38; 
of dispersion, 38-48. 

Partial: association, 313-18 ; contingency, 321-2 ; 
correlation, see Correlation 
Regression. 

Pattern functions, in sampling k-statisties, 262-5, 
277-8, 279, (Exercise 11.11), 287; 

Pea breeding, (Example 12.2), 299, 

Pearce, T. V., data from, (Table 1.23), 26. 

Pearse, G. E., data from, (Table 1.11), 10 
corrections when ordinates are infinite, 86. 

Pearson, E. S., distribution of range, 223, 224; 
sampling of correlation coefficient, 316; 
distribution of 4/b, 280-1, (Exercise 11.17), 
289 ; refs., range, 228 ; estimating standard 
deviation, 228; distribution of frequency 
constants in skew population, 228; tests 
for normality, 285 ; correlation coefficient, 
363; polychoric coefficients, (under K. 
Pearson), 363. 

Pearson, Karl, data from : trypanosomes, (Table 
1.13), 12; fecundity of mares, (Table 1.20), 
24; whist deals, (Table 5.4), 128; height 
of fathers and sons, (Table 14.3), 327; 
quoting data by Goring on crime, (Table 
14.6), 356; quoting data by Elderton on 
alcoholism, (Exercise 14.12), 366. 

Coefficient of variation, 43; measure of 
skewness, 81; coefficient of contingency, 
319-20 ; sampling of contingency coefficients, 
321; sampling of tetrachoric 7, 356, and of 
biserial 7, 358; grades and Spearman’s р, 
410. 

Refs., corrections to abrupt distributions 
(under Pairman), 85 ; skew variation, 134 ; 
moments of hypergeometric, 134; 15-con- 
stant frequency surface, 160; standard 
errors of frequency constants, 228-9 ; mean 
character of ranked individual, 229 ; distri- 
bution of 2, 251, 305 ; of difference of Type 
III variates, (Exercise 10.6), 252 ; sampling 
of contingency coefficients, 322 ; multiple 
contingency, 322; sampling of correlation 
coefficient, (under Co-operative Study), 363 ; 
probable error of biserial 7, 363; rank 
correlation, 436. 

Pearson, M. V., refs., mean character of ranked 
individuals, 229. Я 

Pearson distributions, as limit of hypergeometric, 
132-3; generally, 137-45; recurrence rc- 
lation for moments, 138 ; skewness of, 138 ; 
inflections of, 138; fitting of, 143-5; 
quadrature of, 145; generalisation by 
Romanovsky, refs. 160, 161 B distribution 
of means from (ref. Irwin), 250. 


; regression, see 


; refs., 


` Quartiles, definition, 36 ; 


INDEX 


Pitman, E. J. G., refs., significance test applicable : 


to samples from any population, 436. 
Platykurtosis, 82. 
Poincaré, characteristic functions, 113. ibat 
Poisson distribution, generally, 120-2 ; pace. 
of, (Example 6.9), 66 ; moments of, оа 
cise 3.3), 86; normal distribution as EE 
ing form of, (Example 4.8), 113; dee e 
tion function of, 122 ; in mixed popula jos 
122-4; bivariate form, (Exercise M. E 2, 
sampling of attributes from, (Exercist mple 
203; distribution of means from, (Exa 
10.9), 243. кый... 
Polynomials, sce 'Tchebycheff-Hermite polynomials: 
Populations, as basis of statistical wid ad 
existent, 18-19; hypothetical, 19; УР 
іп sampling, 186-7. 
Posterior probabilit; , 176. 
Potatoes, bias in er ria of yield, (Example 84) 
189-90. А 
— and wheat, correlation of yields, 
14.4), 333, (Example 14.3), 332-4. Jagon? 
Pretorius, S. J., data from, on Australian WEDWS 
(Table 1.8), 9; on beans, (Table СЫМА 
and (Table 6.1), 150; refs., skew biya 
distributions, 160. 
Principle of Maximum Likelihood, 
Likelihood. 
of moments, 83; in fitting Pearson 
tributions, 143. 
Prior probability, 176. " £ 105; 
Probability, generally, 164-85 ; logic oi, 70; 
basie rules of direct probability, 166-— j 
in a continuum, 170-1; von Mises арт 
proach, 171-2 ; and statistical distributions, 
172-3; Bayes’ theorem, 178-8; inverse 
probability, 176 ; posterior and prior, 176. 
functions, 14. 
Problem of moments, 105-10; refs. 113-14. 
Product-moment correlation, sce Correlation. 


(Table 


178. See 


's dis- 


Quadrature of Pearson distributions, 145. 

Quantiles, definition, 36; graphical determina- 
tion of, 37-8; standard errors of, 211-13. 

interquartile range as 


measure of dispersion, 38 ; Standard errors 
of, 225. 


Radioactive element 
particles from, 

6.4), 156. 
de К P., logie of probability, 
he Foundations of M, o" 
Random variables, definition отав 
173. ; 

——— Sampling Numbers, 192-7 
andomising machine, (Example 8.3 


(polonium), 


distributi, 
(Table 6.2), Sed 


155, (Example 
105; refs, 


es, 184, 
addition of, 


), 189. 


INDEX 455 


Randomness, 171; random sampling, generally, 
186-203; technique of, 191-7. 

Range, definition, 38; distribution of, 223-4. 

Rank correlation, see Correlation. 

Ranking, estimation of, 421. 

Rankings, problem of m, see m rankings. 

Ratio, distribution of, 248-9; Cramér’s theorem 
(Exercise 10.8), 252; Geary’s theorem 
(Exercise 10.9), 253; refs., 250-1. 

Rectangular population, transformation of fre- 
quency-distribution to, 18; as one of 
Pearson’s distributions, 142; distribution 
of mean of samples from, (Example 10.7), 
240 and (Example 10.12), 244; distribu- 
tion of geometric mean in samples from, 
(Example 10.13), 245-6. 

Recurrence relations for moments of binomial, 118. 

Registrar-General's Statistical Review of England 
and Wales, data from, (Table 1.1), 3; 
(Table 1.3), 5; (Table 1.11), 11. 

Registration districts, distribution according to 
births, (Table 14.2), 326. = 

Regression, definition, 327-9 ; coefficients of, 329 ; 
criterion for linearity of, 335-6; sampling 
of coefficients of, 336-7, 347-9; standard 
error of coefficients, 337 ; significance of, 
358-9; partials, 368-79; sampling of 
partials, 378-9. 

Religion, correlation with erime, (Example 15.2), 
375-6. 

Reserves and bank 
(Table 1.26), 28. 

Residuals, in regression equations, 369. 

Ritchie-Scott, A., refs., correlation coefficient of 
polyehorie table, 363. 

Romanovsky, V., refs., method of moments, 86 ; 
moments of hypergeometric, 134 and (Exer- 
cise 5.2), 135; generalisation of Pearson 
distributions, 160. 

Room-space, distribution of families deficient in, 
(Table 1.24), 27. 

Rothamsted Experimental Station, data from, 
(Table 8.1), 187. 

Rutherford, Lord, data on emission of radioactive 
particles, (Example 6.4), 156. 


deposits, distribution of, 


St. Georgescu, N., refs., sampling moments, 285. 

Saltus, in distribution function; 14. 

Sampling, preliminary, 174 ; simple, 174 ; random 
sampling, see Random ; sampling problem, 
186 ; with and without replacement, 186-7 ; 
randomness in, 187-97; lottery or ticket, 


192; from continuous population, 197; 
from attributes, 197-202. 
—— distributions, 173-5; role in sampling 


problems, 201; exact, 931-53; derivation 
by analytical methods, 231-6, by geo- 
metrical methods, 236-42, by characteristic 


functions, 242-6, by induction, 246-8; of 
а sum, 246-7; of a ratio, 248-9; multi- 
variate, 250; approximations to, 254-89. 
Sampling moments, generally, 254-89. See Cumu- 
lants, k-statistics. 
Scale reading, bias in, (Example 8.2), 188. 
Scarlet fever, deaths from, (Table 1.3), 5. 
Schoolchildren, distribution according to intelli- 
gence and clothing, (Example 13.6), 320. 
Second Limit Theorem, 110-13. 
Semi-interquartile range, as measure of skewness, 
38; standard error of, 215. 


 Seminvariant statistics, 84—5, 256, refs. (Dressel 


and Kendall), 285. 

Seminvariants, 61, 84-5, refs., 84-5. 
lants. 

Sentences, distribution of according to length, 
(Table 1.21), 25. 

Sheppard, W. F., tables of normal distribution, 
130 and refs., 134; correlation coefficient, 
(Exercise 14.4), 364. 

Sheppard's corrections, 68-74; as average cor- 
rections, 74-5; for discrete data, 77, 
(Exercise 3.13), 88; to factorial moments, 
77-8; to cumulants, 78; multivariate 
ease, 80-1; compared with sampling 
fluctuations, 210. 

Shirley poppies, distribution of, (Table 1.5), 7. 

Shohat, J., refs., Stieltjes integrals, 22 ; inequalities 
for moments, 86; Second Limit Theorem, 
112, 113, 114. 

Shuffling of cards, see Card-shuffling. 

Simple sampling, 174. 

Skew distributions, 10. 

Skewness, 10; measures of, 81-2; of Pearson 
distributions, 138 ; standard error of, 225. 

Skulls, Egyptian, distribution of, (Table 1.22), 25. 

Sons, distribution of according to stature, (Table 

14.3), 327. 

Н. E. refs, Frequency Arrays, 134; 
sampling of correlation coefficient (under 
Co-operative Study), 363. 

Spahlinger vaccine, data on, (Exercise 12.7), 307. 

Spearman, C., coeffieieht of rank correlation, 
388-91; sampling of, 394—403; footrule, 
436; refs., rank correlation, 436. 

Square contingency, 319. See y’. 

Standard deviation, 39; standard error of, 224. 

— Distributions, 116-36, 137-63. Sce under 

Binomial, Hypergeometric, Poisson, Nor- 

mal, Pearson distributions, Gram-Charlier 

series, Normalisation of frequency-func- 
tions. 

errors, 199; in attributes, 199-201; gener- 

ally, 204-30; compared with Sheppard 

corrections, 210; of sum and difference, 

226. (For standard errors of particular 

statistics, see under those statistics.) 


See Cumu- 


Soper, 


456 


Standard measure,43; effect on cumulants of trans- 
formation to, 61; on characteristic func- 
tion of transformation to, (Example 4.6), 
103. 

Statistic, definition of, 2. 

Statistical hypothesis, 178. 

Statistical Abstract, data quoted from, 
3. 

—— Review of England and Wales, data from, 
(Table 1.1), 3 ; (Table 1.3), 55 (Table 1.12), 
ШІ ; uA > ] 

——— Studies in the New York Money Market 
(Norton), 28. а 

Statistics, definition of, 1-2. 

Stature, see Height. 

Steffensen, J., on Type B series, 153, 154 ; refs., 
Recent Researches in the Theory of Statistics 
and Actuarial Science, 161. > 

Stereogram, 20-1. 

Stieltjes, J., problem of moments (Exercise 3.12), 
88, 106-7, 109; refs., 114. 

—— integrals, 15-16; refs. (Shohat), 22. 

Stigmatic rays, distribution of poppies -according 
to, (Table 1.5), 7. x : 

Stouffer, K. A., distribution of difference of Type 
III variates, (Exercise 10.6), 252. 

“Student” (W. S. Gosset), refs., Poisson dis- 
tribution, 134; probable error of mean, 
251; sampling of Spearman’s coefficients 
of rank correlation, 436. 

" Student's ” distribution, (Example 10.6), 939— 
40; (Example 10.17), 948; in testing 
correlation coefficient, 343; in testing 
Spearman's p, 401; in testing regression 
coefficients, 349. : 

Succession rule of Laplace, (Example 7.7), 177. 

Suicides, distribution of, (Table 1.6), 7. 

Sum of two variates, distribution of, 246-7. 

Sur-tax and super-tax, distribution of incomes 
liable to, (Table 1.2), 3 and histogram 
(Fig. 1.2), 4. - 


(Table 1.2), 


t-distribution, see “ Student/s ” distribution. 

Tchebycheff, P. L., problem of moments, 114; 
inequality, (Exercise 8.4), 203. 

Tchebycheff-Hermite polynomials, 145-7; refs., 
160. . 

Teeth and feeding in infants, (Example 12.5), 304. 

'Telephone directory, distribution of digits from, 
(Table 1.4), 6, 193. с 

Term of abortion, distribution of women according 
to, (Table 1.23), 26. 

Tetrachorie functions, 151, 356. 

54-6. 

ae TON. cumulants, 61; quotation about 
oracles, 178; sampling cumulants, 256 ; 
refs., Theory of Ора саайта. 86, 285. 

Thompson, C., tables of 7%, 294. 


INDEX 


"Ticket sampling, 192. Я Tumbes 
Tippett, L. Н. C., Random Sampling ^ a e 
193, 197; distribution of extreme values 
220 and refs., 229; distribution of range | 
223-4 and refs., 229. 
Tocher, J. F., data from, (Table Lu 27. | 
Transformation of a variate, 16, 21-2. 
Trigonometrical representation of correlations, 
372. ч 
Truncated distributions, 11. le 1.13), 12 
Trypanosomes, distribution of, (Table 256; 20: 
Tschuprow, A. A., sampling moments, sampling 
efficient of contingency, 320 ; refs., M: 
* moments, 284, 985. С in, | 
Tsetse flies, distribution of trypanosomes | 
(Tablo 1.13), 12. 
Type A; Type B series, see 
Type I distribution, 139-40. fid 
IL distribution, 141-2 ; AE 
from (Exercise 10.4), 251. - ion and 
III EA d characteristic Hy te ae 
moments of, (Example 3.0); ара generally, 
lants of, (Example 3.11); iet sum of 
142; as sampling distribution of means 
variances, 231-3; distribution. eometric 
from (Example 10.11), 244; 048 


ier series. 
Gram-Charlier serie 


on of means 


h 25l; dis- 
а Т дЫ uen (Exercise 
tribution of differences from, 
10.6), 252. 
—— IV distribution, 140-1. TE 


V distribution, 141; moments an 
lants, (Exercise 3.12), 67-8. 

VI distribution, 140. 
— — VII distribution, 142 ; 
2 cise 3.1), 86. 

Types VIIL-XII distributions, 142-3. 


moments of, (Exer- 


U-shaped distributions, 10-11. 
Unbiased estimates, APT 
Jni 1 distributions, 29. м 
Tacit J. V., rofs.» Central Limit, КЫСЫР | 
183; Introduction to Mathematical Pro 
bility, 184, 251. 
' бг. 


Variable, random, see Random variable. | 


“Variance, 39; as half mean-square of differences: | 


42; standard error of, 224; y 
of, in normal samples, (Example a 
238-9, (Example. 10.14), 246; of весо | 
mean-moment, (Example 11.2), 265 ; 
moment of, (Example 11:3), 260. - 

Variate, definition of, 2; transformations 
16-18, 21-2. of 

Variation, coefficient of, 43; standard error 

(Example 9.5), 209, 224. 

J. A., refs., Logic of Chance, 184. 

Н. D., data from, (Table 14.2), 320- 


of, 


Venn, 
Vigor, 


INDEX 


Wallis, W. A., refs., correlation ratio for ranked 
data, 437 and (Exercise 16.3), 437. 
Weather, correlation with crops, (Example 15.1), 

373-5. 

Weierstrass, K., diagonal process, 100; theorem 
on series of polynomials, (Exercise 4.7), 115. 

Weight, distribution of men according to, 5° 
1.10), 10. 

Weldon, W. F. R., dice-throwing data, (Table 1, 14), 
19; (Table 1.16), 235 (Table. 5.1), m; 
(ха 8.10), 199. 

Wheat, correlation of yields with potatoes, (Table 
14.4), 333; (Example 14.3), 332-4. 


——— plants, distribution of ranks according to. 


height, (Table 8.1), 187. 

Whist, distribution of first hands at, (Table 5.4), 
198; (Example 12.1), 297-8. 

Whitaker, L., refs., Poisson distribution, 134. 

Wicksell, S. D., example from, (Example 14.4), 336. 

Willcox, W. F., 

Wilson, E. B., limiting distribution of 42, 294-6 
and refs., 305. 

Wishart, J., introduction of word ' “cumulant,” 
85; refs.: Romanovsky's generalisation 
of Pinten distributions, 161; derivation 
of pattern formulae (under Fisher), 285 ; 
sampling cumulant formulae, 285; | dis- 
tribution of multiple correlation and cor- 
relation ratios, 386. ^ 

Wold, H., Sheppard's corrections, 71, S0 ; refs., 86. 

Women, distribution of according to term of 
nbortion, (Table 1.23), 26. 

Woo, T. L., data from, on skulls (Table 1.22), 25 ; 
on association of hand and eye, (Exercise 
13.5), 323 ; tables of correlation ratio, 354. 


definitions of statistics, 1, refs., 22. _ 


457 


Yasukawa, K., refs. standard error of mode, 
225. 
Yates, F., data from, height of plants, (Table 8.1), 
187; Random Sampling Numbers, 194, 
197; tables of 5?, 293; correction to y? 
for grouping, 303 and (Example 12.3) 304; 
E bias in sampling, 202 ; correction to 
> , 305. . t 
Yield. dr grain, distribution of, (Table 1.18), 23; 
: of wheat and potatoes, correlation of, 
"(Table 14.4), - -333, (Exumplo 14.3) 332- 
334. 
Young, A. W., refs., sampling on correlation co- 
efficient (under Co- operative Study), 363. 
Yule, G. Udny, data from, poppies, (Table, 1. 5), 73 
sentence length, (Table 1,21), 25 ; industrial 
accidents, (Table 5.3), 124; prints on 
photographie paper, (Exercise 12.8), 307 ; 
inoculation against cholera, (Example 13.1), 
“309. births and registration districts; 
(Table 14.2), 326; data compiled from 
Fay, (Table 1.19), 24. 
Negatively indexed. binomials, (footnote), 
125; normal distribution (Exercise 5.0), 
136; bias in scale-reading, 188; tables of 
x5 293; coefficients of association, 310-13 ; 
refs. reading a scale, 202; degrees of 
freedom in contingency tables, 305; theory 
of correlation, 363, 385, 386. 


z-distribution, (Example 10.18), 249; in testing 
correlation ratio, 353-4 ; in testing multiple 
correlation Coefficient, 381-2; «їп testing 
concordance in rankings, 419. 


